**Dwarkesh Patel:** 你知道什么最疯狂吗?就是这一切都是真的。
**Dwarkesh Patel:** You know what's crazy? That all of this is real.
**Ilya Sutskever:** 什么意思?
**Ilya Sutskever:** Meaning what?
**Dwarkesh Patel:** 你不觉得吗?所有这些 AI 的事情,还有整个 Bay Area……这一切正在发生。这难道不像是科幻小说里的情节吗?
**Dwarkesh Patel:** Don't you think so? All this AI stuff and all this Bay Area… that it's happening. Isn't it straight out of science fiction?
**Dwarkesh Patel:** 另一件疯狂的事情是,慢速起飞(slow takeoff)感觉是多么正常。我们正在把 GDP 的 1% 投入 AI,我本以为这件事会感觉更震撼,但现在就是觉得……
**Dwarkesh Patel:** Another thing that's crazy is how normal the slow takeoff feels. The idea that we'd be investing 1% of GDP in AI, I feel like it would have felt like a bigger deal, whereas right now it just feels...
**Ilya Sutskever:** 事实证明,我们适应事物的速度非常快。但这也是因为它还挺抽象的。它意味着什么呢?它意味着你在新闻里看到,某某公司宣布了某个金额的投资。那就是你看到的全部。到目前为止,你并没有以任何其他方式真正感受到它。
**Ilya Sutskever:** We get used to things pretty fast, it turns out. But also it's kind of abstract. What does it mean? It means that you see it in the news, that such and such company announced such and such dollar amount. That's all you see. It's not really felt in any other way so far.
**Dwarkesh Patel:** 我们要不要从这里正式开始?我觉得这是一个很有趣的讨论。
**Dwarkesh Patel:** Should we actually begin here? I think this is an interesting discussion.
**Ilya Sutskever:** 当然可以。
**Ilya Sutskever:** Sure.
**Dwarkesh Patel:** 我认为你说的这个观点——从普通人的角度看什么都没有太大不同——即使到了奇点(singularity),这一点仍然会成立。
**Dwarkesh Patel:** I think your point, about how from the average person's point of view nothing is that different, will continue being true even into the singularity.
**Ilya Sutskever:** 不,我不这么认为。
**Ilya Sutskever:** No, I don't think so.
**Dwarkesh Patel:** 哦,有意思。
**Dwarkesh Patel:** Okay, interesting.
**Ilya Sutskever:** 我所说的"感觉没什么不同"指的是,好吧,某某公司宣布了某个难以理解的投资金额。我觉得没有人知道该拿这个信息怎么办。但我认为 AI 的影响将会被感受到。AI 将会扩散到经济的方方面面。会有非常强的经济力量推动这件事,我认为这种影响将被非常强烈地感受到。
**Ilya Sutskever:** The thing which I was referring to not feeling different is, okay, such and such company announced some difficult-to-comprehend dollar amount of investment. I don't think anyone knows what to do with that. But I think the impact of AI is going to be felt. AI is going to be diffused through the economy. There'll be very strong economic forces for this, and I think the impact is going to be felt very strongly.
**Dwarkesh Patel:** 你预计这种影响会在什么时候到来?
**Dwarkesh Patel:** When do you expect that impact?
**Ilya Sutskever:** 我认为模型看起来比它们的经济影响所暗示的要更聪明。
**Ilya Sutskever:** I think the models seem smarter than their economic impact would imply.
**Dwarkesh Patel:** 是的。
**Dwarkesh Patel:** Yeah.
**Ilya Sutskever:** 这是当前模型最令人困惑的事情之一。如何调和它们在 eval 上表现如此出色的事实?你看那些 eval 然后会说,"这些 eval 可不简单啊。"它们表现得如此出色。但经济影响似乎远远落后。这很难理解——模型一方面能做出这些令人惊叹的事情,另一方面却会在某些情况下重复自己两次?
举个例子,假设你在用 vibe coding 做某件事。你到了某个地方然后遇到了一个 bug。然后你告诉模型,"你能修复这个 bug 吗?"模型说,"天哪,你说得太对了。我有一个 bug。让我去修复它。"然后它引入了第二个 bug。然后你告诉它,"你有这个新的第二个 bug,"它告诉你,"天哪,我怎么会这样做?你又说对了,"然后把第一个 bug 带了回来,你就可以在这两个 bug 之间来回循环。
这怎么可能呢?我不确定,但这确实表明有一些奇怪的事情在发生。我有两个可能的解释。更异想天开的解释是,也许 RL 训练使模型变得有点过于专注和狭隘,有点过于缺乏意识——尽管它同时也在某些其他方面使它们变得更有意识。正因为如此,它们做不了基本的事情。
但还有另一个解释。当人们做 pre-training 的时候,用什么数据来训练这个问题是有答案的,因为答案就是——所有数据。当你做 pre-training 的时候,你需要所有的数据。所以你不必思考是用这个数据还是那个数据。但当人们做 RL 训练时,他们确实需要思考。他们会说,"好吧,我们想要针对这个任务做这种 RL 训练,针对那个任务做那种 RL 训练。"据我所知,所有的公司都有专门的团队来生产新的 RL 环境,然后把它添加到训练组合中。
问题是,那些 RL 环境是什么?自由度太多了。你可以生产的 RL 环境种类繁多。你可以做的一件事——我认为这是无意中正在发生的事情——是人们从 eval 中获取灵感。你会说,"嘿,我希望我们的模型在发布时表现得非常好。我希望 eval 看起来很棒。什么样的 RL 训练能在这个任务上有所帮助?"我认为这是正在发生的事情,这可以解释很多正在发生的现象。如果你把这一点与模型的泛化能力实际上不够充分结合起来,就有可能解释我们看到的很多现象——eval 表现和实际真实世界表现之间的这种脱节,这是我们今天甚至不理解的、我们所说的那个意思。
**Ilya Sutskever:** This is one of the very confusing things about the models right now. How to reconcile the fact that they are doing so well on evals? You look at the evals and you go, "Those are pretty hard evals." They are doing so well. But the economic impact seems to be dramatically behind. It's very difficult to make sense of, how can the model, on the one hand, do these amazing things, and then on the other hand, repeat itself twice in some situation? An example would be, let's say you use vibe coding to do something. You go to some place and then you get a bug. Then you tell the model, "Can you please fix the bug?" And the model says, "Oh my God, you're so right. I have a bug. Let me go fix that." And it introduces a second bug. Then you tell it, "You have this new second bug," and it tells you, "Oh my God, how could I have done it? You're so right again," and brings back the first bug, and you can alternate between those. How is that possible? I'm not sure, but it does suggest that something strange is going on. I have two possible explanations. The more whimsical explanation is that maybe RL training makes the models a little too single-minded and narrowly focused, a little bit too unaware, even though it also makes them aware in some other ways. Because of this, they can't do basic things. But there is another explanation. Back when people were doing pre-training, the question of what data to train on was answered, because that answer was everything. When you do pre-training, you need all the data. So you don't have to think if it's going to be this data or that data. But when people do RL training, they do need to think. They say, "Okay, we want to have this kind of RL training for this thing and that kind of RL training for that thing." From what I hear, all the companies have teams that just produce new RL environments and just add it to the training mix. The question is, well, what are those? There are so many degrees of freedom. There is such a huge variety of RL environments you could produce. One thing you could do, and I think this is something that is done inadvertently, is that people take inspiration from the evals. You say, "Hey, I would love our model to do really well when we release it. I want the evals to look great. What would be RL training that could help on this task?" I think that is something that happens, and it could explain a lot of what's going on. If you combine this with generalization of the models actually being inadequate, that has the potential to explain a lot of what we are seeing, this disconnect between eval performance and actual real-world performance, which is something that we don't today even understand, what we mean by that.
**Dwarkesh Patel:** 我喜欢这个想法——真正的 reward hacking 是那些过于关注 eval 的人类研究员。
**Dwarkesh Patel:** I like this idea that the real reward hacking is the human researchers who are too focused on the evals.
**Ilya Sutskever:** 我认为有两种方式来理解,或者说尝试思考你刚才指出的这一点。一种是,如果仅仅通过在编程竞赛中变得超越人类水平,一个模型并不会自动变得更有品味、更能对如何改进你的代码库做出更好的判断,那么你应该扩展环境套件,让你不仅仅是在编程竞赛的最佳表现上测试它。它还应该能够为 X 事物或 Y 事物或 Z 事物制作最好的应用程序。
另一种,也许这就是你暗示的,是说,"为什么在编程竞赛中变得超越人类水平不能让你成为一个更有品味的、更全面的程序员?"也许该做的不是不断堆叠环境的数量和多样性,而是找到一种方法,让你从一个环境中学习并提高你在其他事情上的表现。
我有一个人类的类比可能会有帮助。让我们以竞赛编程为例,因为你提到了这个。假设你有两个学生。其中一个决定他想成为最好的竞赛程序员,所以他会为这个领域练习一万个小时。他会解决所有的问题,记住所有的证明技巧,并且非常擅长快速正确地实现所有算法。通过这样做,他成为了最顶尖的人之一。
第二个学生想的是,"哦,竞赛编程很酷。"也许他们只练习了一百个小时,少得多,但他们也做得非常好。你觉得哪一个在以后的职业生涯中会做得更好?
**Ilya Sutskever:** I think there are two ways to understand, or to try to think about, what you have just pointed out. One is that if it's the case that simply by becoming superhuman at a coding competition, a model will not automatically become more tasteful and exercise better judgment about how to improve your codebase, well then you should expand the suite of environments such that you're not just testing it on having the best performance in coding competition. It should also be able to make the best kind of application for X thing or Y thing or Z thing. Another, maybe this is what you're hinting at, is to say, "Why should it be the case in the first place that becoming superhuman at coding competitions doesn't make you a more tasteful programmer more generally?" Maybe the thing to do is not to keep stacking up the amount and diversity of environments, but to figure out an approach which lets you learn from one environment and improve your performance on something else. I have a human analogy which might be helpful. Let's take the case of competitive programming, since you mentioned that. Suppose you have two students. One of them decided they want to be the best competitive programmer, so they will practice 10,000 hours for that domain. They will solve all the problems, memorize all the proof techniques, and be very skilled at quickly and correctly implementing all the algorithms. By doing so, they became one of the best. Student number two thought, "Oh, competitive programming is cool." Maybe they practiced for 100 hours, much less, and they also did really well. Which one do you think is going to do better in their career later on?
**Dwarkesh Patel:** 第二个。
**Dwarkesh Patel:** The second.
**Ilya Sutskever:** 对。我认为这基本上就是正在发生的事情。模型更像是第一个学生,甚至更甚。因为然后我们说,模型应该擅长竞赛编程,所以让我们获取所有的竞赛编程题目。然后让我们做一些数据增强,这样我们就有更多的竞赛编程题目,然后我们在上面训练。现在你得到了这个出色的竞赛程序员。
有了这个类比,我觉得就更直观了。是的,好吧,如果它训练得这么充分,所有不同的算法和所有不同的证明技巧都在它的指尖。而且更直观的是,有了这种程度的准备,它不一定会泛化到其他事情。
**Ilya Sutskever:** Right. I think that's basically what's going on. The models are much more like the first student, but even more. Because then we say, the model should be good at competitive programming so let's get every single competitive programming problem ever. And then let's do some data augmentation so we have even more competitive programming problems, and we train on that. Now you've got this great competitive programmer. With this analogy, I think it's more intuitive. Yeah, okay, if it's so well trained, all the different algorithms and all the different proof techniques are right at its fingertips. And it's more intuitive that with this level of preparation, it would not necessarily generalize to other things.
**Dwarkesh Patel:** 但那么第二个学生在做那一百个小时的 fine-tuning 之前做的事情,类比是什么呢?
**Dwarkesh Patel:** But then what is the analogy for what the second student is doing before they do the 100 hours of fine-tuning?
**Ilya Sutskever:** 我认为他们有"那种东西"。那个"它"的因素。当我还是本科生的时候,我记得有一个和我一起学习的学生就是这样的,所以我知道它是存在的。
**Ilya Sutskever:** I think they have "it." The "it" factor. When I was an undergrad, I remember there was a student like this that studied with me, so I know it exists.
**Dwarkesh Patel:** 我觉得有趣的是要把"那种东西"和 pre-training 所做的事情区分开来。理解你刚才说的"在 pre-training 中不必选择数据"的一种方式是,它实际上和一万个小时的练习并没有什么不同。只是你免费获得了那一万个小时的练习,因为它已经在 pre-training 的分布中了。但也许你想说的是,pre-training 实际上并没有那么多泛化能力。Pre-training 中有大量的数据,但它不一定比 RL 泛化得更好。
**Dwarkesh Patel:** I think it's interesting to distinguish "it" from whatever pre-training does. One way to understand what you just said about not having to choose the data in pre-training is to say it's actually not dissimilar to the 10,000 hours of practice. It's just that you get that 10,000 hours of practice for free because it's already somewhere in the pre-training distribution. But maybe you're suggesting there's actually not that much generalization from pre-training. There's just so much data in pre-training, but it's not necessarily generalizing better than RL.
**Ilya Sutskever:** Pre-training 的主要优势在于:A,有大量的数据;B,你不必苦心思考什么数据要放入 pre-training。它是非常自然的数据,其中包含了很多人们所做的事情:人们的想法和很多特征。它就像是整个世界被人们投射到文本上,而 pre-training 试图用大量的数据来捕捉这些。
Pre-training 非常难以推理,因为很难理解模型依赖 pre-training 数据的方式。每当模型犯错误时,是否可能是因为某些东西碰巧在 pre-training 数据中没有得到足够的支持?"被 pre-training 支持"也许是一个宽泛的说法。我不确定我能在这个问题上补充更多有用的东西。
我不认为人类有一个和 pre-training 对应的类比。
**Ilya Sutskever:** The main strength of pre-training is that: A, there is so much of it, and B, you don't have to think hard about what data to put into pre-training. It's very natural data, and it does include in it a lot of what people do: people's thoughts and a lot of the features. It's like the whole world as projected by people onto text, and pre-training tries to capture that using a huge amount of data. Pre-training is very difficult to reason about because it's so hard to understand the manner in which the model relies on pre-training data. Whenever the model makes a mistake, could it be because something by chance is not as supported by the pre-training data? "Support by pre-training" is maybe a loose term. I don't know if I can add anything more useful on this. I don't think there is a human analog to pre-training.
**Dwarkesh Patel:** 这里有人们提出的关于人类对应 pre-training 的类比。我很好奇你对它们为什么可能是错误的有什么看法。一个是想想一个人生命中最初的 18 年,或 15 年,或 13 年,在这段时间里他们不一定有经济产出,但他们正在做一些让他们更好地理解世界的事情等等。另一个是把进化看作是进行了 30 亿年的某种搜索,然后产生了一个人类的生命实例。我很好奇你是否认为这两者中的任何一个与 pre-training 类似。如果不是 pre-training,你如何思考人类一生的学习是什么样的?
**Dwarkesh Patel:** Here are analogies that people have proposed for what the human analogy to pre-training is. I'm curious to get your thoughts on why they're potentially wrong. One is to think about the first 18, or 15, or 13 years of a person's life when they aren't necessarily economically productive, but they are doing something that is making them understand the world better and so forth. The other is to think about evolution as doing some kind of search for 3 billion years, which then results in a human lifetime instance. I'm curious if you think either of these are analogous to pre-training. How would you think about what lifetime human learning is like, if not pre-training?
**Ilya Sutskever:** 我认为这两者和 pre-training 之间有一些相似之处,pre-training 试图扮演这两者的角色。但我认为也有一些很大的差异。Pre-training 数据的量是非常、非常惊人的。
**Ilya Sutskever:** I think there are some similarities between both of these and pre-training, and pre-training tries to play the role of both of these. But I think there are some big differences as well. The amount of pre-training data is very, very staggering.
**Dwarkesh Patel:** 是的。
**Dwarkesh Patel:** Yes.
**Ilya Sutskever:** 不知怎的,一个人类,即使经过 15 年、只接触了 pre-training 数据的极小一部分,他们知道的要少得多。但无论他们知道什么,他们在某种程度上知道得更深。已经在那个年龄,你就不会犯我们的 AI 所犯的那种错误。
还有另一件事。你可能会说,这可能像进化吗?答案是也许。但在这种情况下,我认为进化实际上可能有优势。我记得读到过这个案例。神经科学家了解大脑的一种方式是研究大脑不同部分受损的人。有些人有你能想象到的最奇怪的症状。这实际上非常、非常有趣。
有一个与此相关的案例浮现在我脑海中。我读到过一个人,他有某种脑损伤,中风或事故,使他的情绪处理能力丧失了。所以他不再感受到任何情绪。他仍然非常善于表达,他能解决小谜题,在测试中他看起来完全没问题。但他不再感受到任何情绪。他不感到悲伤,他不感到愤怒,他不感到兴奋。他在做任何决定方面变得极其糟糕。他需要花好几个小时来决定穿哪双袜子。他会做出非常糟糕的财务决定。
这说明了我们内置的情绪在使我们成为一个有效的 agent 方面扮演着什么角色?回到你关于 pre-training 的问题,也许如果你足够擅长从 pre-training 中获取一切,你也能得到那个。但那种东西似乎……嗯,从 pre-training 中得到那个可能行也可能不行。
**Ilya Sutskever:** Somehow a human being, after even 15 years with a tiny fraction of the pre-training data, they know much less. But whatever they do know, they know much more deeply somehow. Already at that age, you would not make mistakes that our AIs make. There is another thing. You might say, could it be something like evolution? The answer is maybe. But in this case, I think evolution might actually have an edge. I remember reading about this case. One way in which neuroscientists can learn about the brain is by studying people with brain damage to different parts of the brain. Some people have the most strange symptoms you could imagine. It's actually really, really interesting. One case that comes to mind that's relevant. I read about this person who had some kind of brain damage, a stroke or an accident, that took out his emotional processing. So he stopped feeling any emotion. He still remained very articulate and he could solve little puzzles, and on tests he seemed to be just fine. But he felt no emotion. He didn't feel sad, he didn't feel anger, he didn't feel animated. He became somehow extremely bad at making any decisions at all. It would take him hours to decide on which socks to wear. He would make very bad financial decisions. What does it say about the role of our built-in emotions in making us a viable agent, essentially? To connect to your question about pre-training, maybe if you are good enough at getting everything out of pre-training, you could get that as well. But that's the kind of thing which seems... Well, it may or may not be possible to get that from pre-training.
**Dwarkesh Patel:** "那个"是什么?显然不仅仅是情绪本身。
**Dwarkesh Patel:** What is "that"? Clearly not just directly emotion.
**Ilya Sutskever:** 它看起来像某种几乎类似 value function 的东西,它在告诉你任何决定的最终奖励应该是什么。
**Ilya Sutskever:** It seems like some almost value function-like thing which is telling you what the end reward for any decision should be.
**Dwarkesh Patel:** 你认为这不是从 pre-training 中隐含地得到的吗?
**Dwarkesh Patel:** You think that doesn't sort of implicitly come from pre-training?
**Ilya Sutskever:** 我认为它可以。我只是说这不是百分之百显而易见的。但那是什么?你如何思考情绪?情绪的 ML 类比是什么?它应该是某种 value function 的东西。但我不认为有一个很好的 ML 类比,因为目前 value function 在人们所做的事情中并没有发挥非常突出的作用。
**Ilya Sutskever:** I think it could. I'm just saying it's not 100% obvious. But what is that? How do you think about emotions? What is the ML analogy for emotions? It should be some kind of a value function thing. But I don't think there is a great ML analogy because right now, value functions don't play a very prominent role in the things people do.
**Dwarkesh Patel:** 也许值得为观众定义一下什么是 value function,如果你愿意的话。
**Dwarkesh Patel:** It might be worth defining for the audience what a value function is, if you want to do that.
**Ilya Sutskever:** 当然,我很乐意这样做。当人们做强化学习时,强化学习现在是怎么做的,人们如何训练那些 agent?你有你的神经网络,你给它一个问题,然后你告诉模型,"去解决它。"模型可能采取数千、数十万个动作或思考或某些步骤,然后它产生一个解决方案。解决方案被评分。然后分数被用来为你轨迹中的每一个动作提供训练信号。这意味着如果你在做需要很长时间才能解决的任务——如果你在训练一个需要很长时间才能解决的任务——在你提出解决方案之前,它完全不会学到任何东西。这就是强化学习的朴素做法。这大概就是 o1、R1 的做法。
Value function 说的是,"也许我有时候——不总是——能告诉你你做得好还是不好。"Value function 的概念在某些领域比其他领域更有用。例如,当你下棋并且丢了一个子,我搞砸了。你不需要打完整盘棋才知道我刚才做的事情是糟糕的,因此在它之前的任何决定也是糟糕的。Value function 让你可以短路等待直到最后的过程。
假设你在做某种数学或编程的事情,你正在探索一个特定的解决方案或方向。在比如说一千步思考之后,你得出结论这个方向没有前途。一旦你得出这个结论,你就可以在一千个时间步之前——当你决定走这条路的时候——就已经得到一个奖励信号。你说,"下次在类似情况下我不应该走这条路,"远在你实际提出解决方案之前。
**Ilya Sutskever:** Certainly, I'll be very happy to do that. When people do reinforcement learning, the way reinforcement learning is done right now, how do people train those agents? You have your neural net and you give it a problem, and then you tell the model, "Go solve it." The model takes maybe thousands, hundreds of thousands of actions or thoughts or something, and then it produces a solution. The solution is graded. And then the score is used to provide a training signal for every single action in your trajectory. That means that if you are doing something that goes for a long time—if you're training a task that takes a long time to solve—it will do no learning at all until you come up with the proposed solution. That's how reinforcement learning is done naively. That's how o1, R1 ostensibly are done. The value function says something like, "Maybe I could sometimes, not always, tell you if you are doing well or badly." The notion of a value function is more useful in some domains than others. For example, when you play chess and you lose a piece, I messed up. You don't need to play the whole game to know that what I just did was bad, and therefore whatever preceded it was also bad. The value function lets you short-circuit the wait until the very end. Let's suppose that you are doing some kind of a math thing or a programming thing, and you're trying to explore a particular solution or direction. After, let's say, a thousand steps of thinking, you concluded that this direction is unpromising. As soon as you conclude this, you could already get a reward signal a thousand timesteps previously, when you decided to pursue down this path. You say, "Next time I shouldn't pursue this path in a similar situation," long before you actually came up with the proposed solution.
**Dwarkesh Patel:** 这在 DeepSeek R1 的论文中提到过——轨迹的空间如此之大,以至于也许很难学习从一个中间轨迹到 value 的映射。而且考虑到,在编程中你会有一个错误的想法,然后你会回退,然后你会改变一些东西。
**Dwarkesh Patel:** This was in the DeepSeek R1 paper— that the space of trajectories is so wide that maybe it's hard to learn a mapping from an intermediate trajectory and value. And also given that, in coding for example you'll have the wrong idea, then you'll go back, then you'll change something.
**Ilya Sutskever:** 这听起来像是对深度学习缺乏信心。当然它可能很困难,但没有什么是深度学习做不到的。我的期望是 value function 应该是有用的,而且我完全相信它们将来会被使用,如果不是已经在用的话。
我刚才提到的那个情绪中枢受损的人,更多的是想说,也许它暗示的是,人类的 value function 是以某种由进化硬编码的重要方式被情绪调节的。也许这对人们在世界上高效运作很重要。
**Ilya Sutskever:** This sounds like such lack of faith in deep learning. Sure it might be difficult, but nothing deep learning can't do. My expectation is that a value function should be useful, and I fully expect that they will be used in the future, if not already. What I was alluding to with the person whose emotional center got damaged, it's more that maybe what it suggests is that the value function of humans is modulated by emotions in some important way that's hardcoded by evolution. And maybe that is important for people to be effective in the world.
**Dwarkesh Patel:** 这正是我打算问你的。关于情绪作为 value function 有一个非常有趣的点,那就是令人印象深刻的是,它们在仍然相当简单易懂的同时有如此大的效用。
**Dwarkesh Patel:** That's the thing I was planning on asking you. There's something really interesting about emotions of the value function, which is that it's impressive that they have this much utility while still being rather simple to understand.
**Ilya Sutskever:** 我有两个回应。我确实同意,与我们正在学习和讨论的那种 AI 相比,情绪相对简单。它们甚至可能简单到你也许能以人类可理解的方式把它们映射出来。我觉得这样做会很酷。
但在效用方面,我认为有一个复杂性-鲁棒性的权衡,复杂的东西可以非常有用,但简单的东西在非常广泛的情境中非常有用。理解我们所看到的一种方式是,我们有这些主要从我们的哺乳动物祖先进化而来的情绪,然后在我们作为人科动物的时期稍微调整了一点点。我们确实有相当多的社交情绪,这些是哺乳动物可能不具备的。但它们并不很复杂。正因为它们不复杂,与我们曾经生活的那个世界相比,它们在这个截然不同的世界中仍能如此好地服务于我们。
实际上,它们也会犯错误。例如,我们的情绪……嗯,实际上,我不确定。饥饿算是一种情绪吗?这是有争议的。但我认为,例如我们对饥饿的直觉感受在这个食物充裕的世界中并没有成功地正确引导我们。
**Ilya Sutskever:** I have two responses. I do agree that compared to the kind of things that we learn and the things we are talking about, the kind of AI we are talking about, emotions are relatively simple. They might even be so simple that maybe you could map them out in a human-understandable way. I think it would be cool to do. In terms of utility though, I think there is a thing where there is this complexity-robustness tradeoff, where complex things can be very useful, but simple things are very useful in a very broad range of situations. One way to interpret what we are seeing is that we've got these emotions that evolved mostly from our mammal ancestors and then fine-tuned a little bit while we were hominids, just a bit. We do have a decent amount of social emotions though which mammals may lack. But they're not very sophisticated. And because they're not sophisticated, they serve us so well in this very different world compared to the one that we've been living in. Actually, they also make mistakes. For example, our emotions… Well actually, I don't know. Does hunger count as an emotion? It's debatable. But I think, for example, our intuitive feeling of hunger is not succeeding in guiding us correctly in this world with an abundance of food.
**Dwarkesh Patel:** 人们一直在谈论扩展数据、扩展参数、扩展计算。有没有更一般的方式来思考扩展(scaling)?其他的扩展轴是什么?
**Dwarkesh Patel:** People have been talking about scaling data, scaling parameters, scaling compute. Is there a more general way to think about scaling? What are the other scaling axes?
**Ilya Sutskever:** 这里有一个我认为可能是对的视角。ML 过去的运作方式是,人们只是摆弄各种东西,试图获得有趣的结果。过去一直就是这样。然后 scaling 的洞察来了。Scaling laws、GPT-3,突然间每个人都意识到我们应该扩展。
这是语言如何影响思维的一个例子。"Scaling"只是一个词,但它是一个如此强大的词,因为它告诉人们该做什么。他们说,"让我们尝试扩展事物。"所以你说,我们在扩展什么?Pre-training 就是要扩展的东西。它是一个特定的 scaling 配方。Pre-training 的重大突破是认识到这个配方是好的。你说,"嘿,如果你把一些计算和一些数据混合到一个特定大小的神经网络中,你会得到结果。你会知道只要你把配方扩大就会更好。"
这也很棒。公司喜欢这个,因为它给你一种非常低风险的方式来投资你的资源。投资资源在研究上要困难得多。比较一下。如果你做研究,你需要说,"去吧,研究员们,去研究,去想出点什么,"而不是获取更多数据、获取更多计算。你知道你会从 pre-training 中得到一些东西。
确实,看起来,根据某些人在 Twitter 上说的各种事情,也许看起来 Gemini 找到了一种从 pre-training 中获取更多的方法。
但在某个时候,pre-training 会用完数据。数据很明显是有限的。那接下来你做什么?要么你做某种增强版的 pre-training,一种与你以前做过的不同的配方,要么你在做 RL,或者也许是别的什么。但现在计算量很大,计算量现在非常大,在某种意义上我们回到了研究的时代。
也许还有另一种说法。直到 2020 年,从 2012 年到 2020 年,那是研究的时代。然后,从 2020 年到 2025 年,那是 scaling 的时代——也许前后有误差,让我们给那些年份加上误差线——因为人们说,"这太棒了。你必须更多地扩展。继续扩展。"就一个词:scaling。但现在规模已经这么大了。信念真的是,"哦,它已经这么大了,但如果你再有 100 倍的规模,一切都会如此不同"?它会不同,当然。但信念是,如果你只是把规模扩大 100 倍,一切都会被改变?我不认为那是对的。
所以又回到了研究的时代,只是用着大型计算机。
**Ilya Sutskever:** Here's a perspective that I think might be true. The way ML used to work is that people would just tinker with stuff and try to get interesting results. That's what's been going on in the past. Then the scaling insight arrived. Scaling laws, GPT-3, and suddenly everyone realized we should scale. This is an example of how language affects thought. "Scaling" is just one word, but it's such a powerful word because it informs people what to do. They say, "Let's try to scale things." So you say, what are we scaling? Pre-training was the thing to scale. It was a particular scaling recipe. The big breakthrough of pre-training is the realization that this recipe is good. You say, "Hey, if you mix some compute with some data into a neural net of a certain size, you will get results. You will know that you'll be better if you just scale the recipe up." This is also great. Companies love this because it gives you a very low-risk way of investing your resources. It's much harder to invest your resources in research. Compare that. If you research, you need to be like, "Go forth researchers and research and come up with something", versus get more data, get more compute. You know you'll get something from pre-training. Indeed, it looks like, based on various things some people say on Twitter, maybe it appears that Gemini have found a way to get more out of pre-training. At some point though, pre-training will run out of data. The data is very clearly finite. What do you do next? Either you do some kind of souped-up pre-training, a different recipe from the one you've done before, or you're doing RL, or maybe something else. But now that compute is big, compute is now very big, in some sense we are back to the age of research. Maybe here's another way to put it. Up until 2020, from 2012 to 2020, it was the age of research. Now, from 2020 to 2025, it was the age of scaling—maybe plus or minus, let's add error bars to those years—because people say, "This is amazing. You've got to scale more. Keep scaling." The one word: scaling. But now the scale is so big. Is the belief really, "Oh, it's so big, but if you had 100x more, everything would be so different?" It would be different, for sure. But is the belief that if you just 100x the scale, everything would be transformed? I don't think that's true. So it's back to the age of research again, just with big computers.
**Dwarkesh Patel:** 这是一个非常有趣的说法。但让我问你一个你刚才提出的问题。我们在扩展什么,拥有一个配方意味着什么?我猜我不知道有一种非常清晰的关系——几乎看起来像物理定律的那种——存在于 pre-training 中。在数据或计算或参数和 loss 之间有一个幂律。我们应该寻找的是什么样的关系,我们应该如何思考这个新配方可能是什么样的?
**Dwarkesh Patel:** That's a very interesting way to put it. But let me ask you the question you just posed then. What are we scaling, and what would it mean to have a recipe? I guess I'm not aware of a very clean relationship that almost looks like a law of physics which existed in pre-training. There was a power law between data or compute or parameters and loss. What is the kind of relationship we should be seeking, and how should we think about what this new recipe might look like? We've already witnessed a transition from one type of scaling to a different type of scaling, from pre-training to RL. Now people are scaling RL. Now based on what people say on Twitter, they spend more compute on RL than on pre-training at this point, because RL can actually consume quite a bit of compute.
**Ilya Sutskever:** 我们已经见证了从一种 scaling 到另一种 scaling 的转变,从 pre-training 到 RL。现在人们正在扩展 RL。根据人们在 Twitter 上说的,他们现在在 RL 上花费的计算量比 pre-training 更多,因为 RL 实际上可以消耗相当多的计算量。你做非常长的 rollout,所以产生这些 rollout 需要大量的计算。然后你每次 rollout 获得相对较少的学习,所以你真的可以花费很多计算量。
我甚至不会称之为 scaling。我会说,"嘿,你在做什么?你正在做的事情是你能做的最有成效的事情吗?你能找到更有成效的方式来使用你的计算资源吗?"
我们之前讨论了 value function 的事情。也许一旦人们擅长了 value function,他们将会更有成效地使用他们的资源。如果你找到一种全新的训练模型的方法,你可以说,"这是 scaling 还是仅仅是使用你的资源?"我认为这变得有点模糊了。
从某种意义上说,当人们当时处于研究的时代时,那是,"让我们试试这个和这个和这个。让我们试试那个和那个和那个。哦,看,有些有趣的事情正在发生。"我认为会有回归到那种状态。
**Ilya Sutskever:** ute. You do very long rollouts, so it takes a lot of compute to produce those rollouts. Then you get a relatively small amount of learning per rollout, so you really can spend a lot of compute. I wouldn't even call it scaling. I would say, "Hey, what are you doing? Is the thing you are doing the most productive thing you could be doing? Can you find a more productive way of using your compute?" We've discussed the value function business earlier. Maybe once people get good at value functions, they will be using their resources more productively. If you find a whole other way of training models, you could say, "Is this scaling or is it just using your resources?" I think it becomes a little bit ambiguous. In the sense that, when people were in the age of research back then, it was, "Let's try this and this and this. Let's try that and that and that. Oh, look, something interesting is happening." I think there will be a return to that.
**Dwarkesh Patel:** 如果我们回到了研究时代,退后一步来看,我们最需要思考的配方部分是什么?当你说 value function 的时候,人们已经在尝试当前的配方了,比如 LLM-as-a-Judge 等等。你可以说那就是一种 value function,但听起来你想的是更根本的东西。我们是否应该重新思考 pre-training 本身,而不仅仅是在那个过程的末尾添加更多步骤?
**Dwarkesh Patel:** If we're back in the era of research, stepping back, what is the part of the recipe that we need to think most about? When you say value function, people are already trying the current recipe, but then having LLM-as-a-Judge and so forth. You could say that's a value function, but it sounds like you have something much more fundamental in mind. Should we even rethink pre-training at all and not just add more steps to the end of that process?
**Ilya Sutskever:** 关于 value function 的讨论,我觉得很有趣。我想强调的是,我认为 value function 会使 RL 更高效,我认为这很重要。但我认为,你用 value function 能做到的任何事情,没有它也能做到,只是更慢。
我认为最根本的事情是,这些模型在某种程度上就是比人类泛化得差得多。这超级明显。这似乎是一个非常根本的事情。
**Ilya Sutskever:** The discussion about value function, I think it was interesting. I want to emphasize that I think the value function is something that's going to make RL more efficient, and I think that makes a difference. But I think anything you can do with a value function, you can do without, just more slowly. The thing which I think is the most fundamental is that these models somehow just generalize dramatically worse than people. It's super obvious. That seems like a very fundamental thing.
**Dwarkesh Patel:** 所以这就是关键:泛化。
**Dwarkesh Patel:** So this is the crux: generalization.
**Ilya Sutskever:** 有两个子问题。一个是关于样本效率:为什么这些模型学习需要比人类多得多的数据?还有第二个问题。即使不考虑数据量的问题,为什么把我们想要的东西教给模型比教给人类要困难得多?
对于人类来说,我们不一定需要一个可验证的奖励就能够……你现在可能正在指导一批研究人员,你和他们交谈,给他们看你的代码,给他们看你是如何思考的。从中,他们学到了你的思维方式和他们应该如何做研究。你不必为他们设定一个可验证的奖励,比如,"好的,这是课程的下一部分,现在这是你课程的下一部分。哦,这次训练不稳定。"没有这种繁琐的、定制化的过程。
也许这两个问题实际上以某种方式相关,但我很好奇探索第二个东西——更像是持续学习——以及第一个东西,那感觉就是样本效率。
**Ilya Sutskever:** There are two sub-questions. There's one which is about sample efficiency: why should it take so much more data for these models to learn than humans? There's a second question. Even separate from the amount of data it takes, why is it so hard to teach the thing we want to a model than to a human? For a human, we don't necessarily need a verifiable reward to be able to… You're probably mentoring a bunch of researchers right now, and you're talking with them, you're showing them your code, and you're showing them how you think. From that, they're picking up your way of thinking and how they should do research. You don't have to set a verifiable reward for them that's like, "Okay, this is the next part of the curriculum, and now this is the next part of your curriculum. Oh, this training was unstable." There's not this schleppy, bespoke process. Perhaps these two issues are actually related in some way, but I'd be curious to explore this second thing, which is more like continual learning, and this first thing, which feels just like sample efficiency.
**Ilya Sutskever:** 你实际上可以想一想,对人类样本效率的一个可能解释——需要被考虑的——是进化。进化给了我们少量的、最有用的信息。对于视觉、听觉和运动等事情,我认为有很强的理由说进化给了我们很多。
例如,人类的灵巧性远远超过……我的意思是,如果你让机器人在模拟中接受大量训练,它们也可以变得灵巧。但在真实世界中训练一个机器人像人一样快速学会一个新技能,这似乎是遥不可及的。在这里你可以说,"哦是的,运动。我们所有的祖先都需要出色的运动能力,松鼠什么的。所以在运动方面,也许我们有一些不可思议的先验。"你可以对视觉做同样的论证。
我记得 Yann LeCun 曾经指出,孩子们在 10 个小时的练习后就学会了开车,这是对的。但我们的视觉非常好。至少对我来说,我记得自己五岁时的样子。我当时对汽车非常兴奋。我很确定我五岁时的汽车识别能力已经足够开车了。作为一个五岁的孩子,你看不到那么多数据。你大部分时间都在父母家里度过,所以你的数据多样性非常低。但你可以说,也许那也是进化。
但在语言、数学和编程方面,可能就不是了。它仍然似乎比模型好。显然,模型在语言、数学和编程方面比普通人要好。但它们在学习方面比普通人好吗?
**Ilya Sutskever:** You could actually wonder that one possible explanation for the human sample efficiency that needs to be considered is evolution. Evolution has given us a small amount of the most useful information possible. For things like vision, hearing, and locomotion, I think there's a pretty strong case that evolution has given us a lot. For example, human dexterity far exceeds… I mean robots can become dexterous too if you subject them to a huge amount of training in simulation. But to train a robot in the real world to quickly pick up a new skill like a person does seems very out of reach. Here you could say, "Oh yeah, locomotion. All our ancestors needed great locomotion, squirrels. So with locomotion, maybe we've got some unbelievable prior." You could make the same case for vision. I believe Yann LeCun made the point that children learn to drive after 10 hours of practice, which is true. But our vision is so good. At least for me, I remember myself being a five-year-old. I was very excited about cars back then. I'm pretty sure my car recognition was more than adequate for driving already as a five-year-old. You don't get to see that much data as a five-year-old. You spend most of your time in your parents' house, so you have very low data diversity. But you could say maybe that's evolution too. But in language and math and coding, probably not. It still seems better than models. Obviously, models are better than the average human at language, math, and coding. But are they better than the average human at learning?
**Dwarkesh Patel:** 哦是的。哦是的,绝对的。
**Dwarkesh Patel:** Oh yeah. Oh yeah, absolutely.
**Ilya Sutskever:** 我想说的是,语言、数学和编程——尤其是数学和编程——表明,无论是什么使人们擅长学习,可能不太是一个复杂的先验,而是更多的某种根本的东西。
**Ilya Sutskever:** What I meant to say is that language, math, and coding—and especially math and coding—suggests that whatever it is that makes people good at learning is probably not so much a complicated prior, but something more, some fundamental thing.
**Dwarkesh Patel:** 我不确定我理解了。为什么应该是这样?
**Dwarkesh Patel:** I'm not sure I understood. Why should that be the case?
**Ilya Sutskever:** 考虑一种人们表现出某种高度可靠性的技能。如果这种技能对我们的祖先在数百万年、数亿年间非常有用,你可以争辩说也许人类之所以擅长它,是因为进化,因为我们有一个先验,一个进化先验,以某种非常不明显的方式编码,使我们如此擅长它。但如果人们在一个直到最近才存在的领域展示了巨大的能力、可靠性、鲁棒性和学习能力,那么这更多地表明,人们可能就是拥有更好的机器学习,句号。
**Ilya Sutskever:** So consider a skill in which people exhibit some kind of great reliability. If the skill is one that was very useful to our ancestors for many millions of years, hundreds of millions of years, you could argue that maybe humans are good at it because of evolution, because we have a prior, an evolutionary prior that's encoded in some very non-obvious way that somehow makes us so good at it. But if people exhibit great ability, reliability, robustness, and ability to learn in a domain that really did not exist until recently, then this is more an indication that people might have just better machine learning, period.
**Dwarkesh Patel:** 我们应该如何思考那是什么?ML 的类比是什么?
**Dwarkesh Patel:** How should we think about what that is? What is the ML analogy?
**Ilya Sutskever:** 有几个有趣的事情。它需要更少的样本。它更加无监督。一个学开车的孩子……孩子们不会学开车。一个学开车的青少年并不是真的在获得某种预建的、可验证的奖励。它来自他们与机器和环境的互动。它需要少得多的样本。它看起来更加无监督。它看起来更鲁棒?
**Ilya Sutskever:** There are a couple of interesting things about it. It takes fewer samples. It's more unsupervised. A child learning to drive a car… Children are not learning to drive a car. A teenager learning how to drive a car is not exactly getting some prebuilt, verifiable reward. It comes from their interaction with the machine and with the environment. It takes much fewer samples. It seems more unsupervised. It seems more robust?
**Dwarkesh Patel:** 鲁棒得多。
**Dwarkesh Patel:** Much more robust.
**Ilya Sutskever:** 人们的鲁棒性真的令人震惊。
**Ilya Sutskever:** The robustness of people is really staggering.
**Dwarkesh Patel:** 你有没有一种统一的方式来思考为什么所有这些事情同时发生?什么样的 ML 类比可以实现这样的东西?
**Dwarkesh Patel:** Do you have a unified way of thinking about why all these things are happening at once? What is the ML analogy that could realize something like this?
**Ilya Sutskever:** 你一直在问的事情之一是,青少年驾驶员如何在没有外部教师的情况下自我纠正和从经验中学习?答案是他们有他们的 value function。他们有一种一般性的感知,顺便说一下,在人类身上也是极其鲁棒的。无论人类的 value function 是什么,除了成瘾方面的少数例外,它实际上是非常、非常鲁棒的。
所以对于像一个正在学开车的青少年来说,他们开始开车,他们已经立即对自己开得怎么样有了感觉,有多糟糕,有多不自信。然后他们看到,"好吧。"然后,当然,任何青少年的学习速度都非常快。10 个小时后,你就可以上路了。
看起来人类有某种解决方案,但我很好奇他们是如何做到的,为什么它如此困难?我们需要如何重新概念化我们训练模型的方式,才能使这样的事情成为可能?
**Ilya Sutskever:** One of the things that you've been asking about is how can the teenage driver self-correct and learn from their experience without an external teacher? The answer is that they have their value function. They have a general sense which is also, by the way, extremely robust in people. Whatever the human value function is, with a few exceptions around addiction, it's actually very, very robust. So for something like a teenager that's learning to drive, they start to drive, and they already have a sense of how they're driving immediately, how badly they are, how unconfident. And then they see, "Okay." And then, of course, the learning speed of any teenager is so fast. After 10 hours, you're good to go.
**Dwarkesh Patel:** 这是一个很好的问题。
**Dwarkesh Patel:** It seems like humans have some solution, but I'm curious about how they are doing it and why is it so hard? How do we need to reconceptualize the way we're training models to make something like this possible?
**Ilya Sutskever:** 这是一个很好的问题,也是一个我有很多看法的问题。但不幸的是,我们生活在一个并非所有机器学习的想法都被自由讨论的世界里,而这就是其中之一。
可能有一种方法可以做到。我认为它是可以做到的。人类就是那样的,我认为这就是一个证明——它是可以做到的。可能还有另一个阻碍,那就是有一种可能性——人类的神经元做的计算比我们认为的更多。如果这是真的,而且如果这起到了重要作用,那么事情可能会更加困难。
但无论如何,我确实认为它指向了某种机器学习原理的存在,我对此有自己的看法。但不幸的是,情况使得很难详细讨论。
**Ilya Sutskever:** That is a great question to ask, and it's a question I have a lot of opinions about. But unfortunately, we live in a world where not all machine learning ideas are discussed freely, and this is one of them. There's probably a way to do it. I think it can be done. The fact that people are like that, I think it's a proof that it can be done. There may be another blocker though, which is that there is a possibility that the human neurons do more compute than we think. If that is true, and if that plays an important role, then things might be more difficult. But regardless, I do think it points to the existence of some machine learning principle that I have opinions on. But unfortunately, circumstances make it hard to discuss in detail.
**Dwarkesh Patel:** 没人听这个播客的,Ilya。
**Dwarkesh Patel:** Nobody listens to this podcast, Ilya.
**Dwarkesh Patel:** 我很好奇。如果你说我们回到了研究时代,你从 2012 年到 2020 年就在那里。如果我们回到研究时代,现在的氛围会是什么样的?例如,即使在 AlexNet 之后,用于运行实验的计算量也在不断增加,前沿系统的规模也在不断增加。你认为现在这个研究时代仍然需要巨大的计算量吗?你认为它需要回到档案中阅读旧论文吗?你在 Google、OpenAI 和 Stanford 这些地方的时候,那里更有一种研究的氛围?我们在社区中应该期待什么样的事情?
**Dwarkesh Patel:** I'm curious. If you say we are back in an era of research, you were there from 2012 to 2020. What is the vibe now going to be if we go back to the era of research? For example, even after AlexNet, the amount of compute that was used to run experiments kept increasing, and the size of frontier systems kept increasing. Do you think now that this era of research will still require tremendous amounts of compute? Do you think it will require going back into the archives and reading old papers? You were at Google and OpenAI and Stanford, these places, when there was more of a vibe of research? What kind of things should we be expecting in the community?
**Ilya Sutskever:** Scaling 时代的一个后果是,scaling 抽走了房间里所有的空气。因为 scaling 抽走了所有的空气,每个人都开始做同样的事情。我们到了这样一个地步——在这个世界里,公司比想法多得多。
实际上关于这一点,硅谷有一句话说,想法是廉价的,执行才是一切。人们经常这么说,这其中有一定的道理。但后来我看到有人在 Twitter 上说了类似的话,"如果想法这么廉价,怎么没人有任何想法?"我认为这也是对的。
如果你从瓶颈的角度来思考研究进展,有几个瓶颈。其中一个是想法,一个是你将它们付诸实现的能力,这可能是计算资源,但也包括工程能力。如果你回到九十年代,你有一些人有相当好的想法,如果他们有更大的计算机,也许他们能证明他们的想法是可行的。但他们做不到,所以他们只能有一个非常、非常小的演示,没有说服任何人。所以瓶颈是计算。
然后在 scaling 时代,计算量大幅增加。当然,需要多少计算量是个问题,但计算量很大。计算量足够大,以至于你是否需要更多的计算来证明某个想法并不明显。
我给你一个类比。AlexNet 是在两块 GPU 上构建的。那是它使用的全部计算量。Transformer 是在 8 到 64 块 GPU 上构建的。没有一个 transformer 论文的实验使用超过 64 块 2017 年的 GPU,那大概相当于今天的两块 GPU?ResNet,对吧?你可以争辩说 o1 的推理也不是世界上最耗计算的事情。
所以对于研究来说,你肯定需要一定量的计算,但远不明显的是你需要有史以来最大量的计算来做研究。你可能会说——我认为这是对的——如果你想构建绝对最好的系统,那么拥有更多计算确实有帮助。特别是如果每个人都在同一个范式内,那么计算就成为一个重要的差异化因素。
**Ilya Sutskever:** One consequence of the age of scaling is that scaling sucked out all the air in the room. Because scaling sucked out all the air in the room, everyone started to do the same thing. We got to the point where we are in a world where there are more companies than ideas by quite a bit. Actually on that, there is this Silicon Valley saying that says that ideas are cheap, execution is everything. People say that a lot, and there is truth to that. But then I saw someone say on Twitter something like, "If ideas are so cheap, how come no one's having any ideas?" And I think it's true too. If you think about research progress in terms of bottlenecks, there are several bottlenecks. One of them is ideas, and one of them is your ability to bring them to life, which might be compute but also engineering. If you go back to the '90s, let's say, you had people who had pretty good ideas, and if they had much larger computers, maybe they could demonstrate that their ideas were viable. But they could not, so they could only have a very, very small demonstration that did not convince anyone. So the bottleneck was compute. Then in the age of scaling, compute has increased a lot. Of course, there is a question of how much compute is needed, but compute is large. Compute is large enough such that it's not obvious that you need that much more compute to prove some idea. I'll give you an analogy. AlexNet was built on two GPUs. That was the total amount of compute used for it. The transformer was built on 8 to 64 GPUs. No single transformer paper experiment used more than 64 GPUs of 2017, which would be like, what, two GPUs of today? The ResNet, right? You could argue that the o1 reasoning was not the most compute-heavy thing in the world. So for research, you definitely need some amount of compute, but it's far from obvious that you need the absolutely largest amount of compute ever for research. You might argue, and I think it is true, that if you want to build the absolutely best system then it helps to have much more compute. Especially if everyone is within the same paradigm, then compute becomes one of the big differentiators.
**Dwarkesh Patel:** 我在问你历史,因为你实际上在那里。我不确定实际上发生了什么。听起来似乎可以用最少量的计算来开发这些想法。但 transformer 并没有立即出名。它之所以成为每个人都开始做的事情——然后开始在上面实验和构建——是因为它在越来越高的计算水平上被验证了。
**Dwarkesh Patel:** I'm asking you for the history, because you were actually there. I'm not sure what actually happened. It sounds like it was possible to develop these ideas using minimal amounts of compute. But the transformer didn't immediately become famous. It became the thing everybody started doing and then started experimenting on top of and building on top of because it was validated at higher and higher levels of compute.
**Ilya Sutskever:** 正确。如果你们在 SSI 有 50 个不同的想法,你怎么知道哪一个是下一个 transformer,哪一个是脆弱的,而没有其他前沿实验室所拥有的那种计算量?
我可以对此发表评论。简短的评论是,你提到了 SSI。具体对我们来说,SSI 用于研究的计算量实际上并不那么小。我想解释为什么。简单的数学可以解释为什么我们拥有的计算量在研究方面可能比人们想象的更具可比性。我来解释一下。
SSI 已经筹集了 30 亿美元,从任何绝对意义上来说都是很多的。但你可能会说,"看看其他公司筹集了更多。"但他们的很多计算用于推理(inference)。那些大数字,那些大额贷款,是专门用于推理的。这是第一点。
第二点,如果你想要有一个做推理的产品,你需要有大量的工程师、销售人员。很多研究需要用来生产各种与产品相关的功能。所以当你看看真正留给研究的是什么,差距就变得小得多了。
另一件事是,如果你在做不同的事情,你真的需要绝对最大的规模来证明它吗?我完全不认为是这样的。我认为在我们的情况下,我们有足够的计算来证明——让自己和其他任何人相信——我们正在做的是正确的。
**Ilya Sutskever:** Correct. And if you at SSI have 50 different ideas, how will you know which one is the next transformer and which one is brittle, without having the kinds of compute that other frontier labs have? I can comment on that. The short comment is that you mentioned SSI. Specifically for us, the amount of compute that SSI has for research is really not that small. I want to explain why. Simple math can explain why the amount of compute that we have is comparable for research than one might think. I'll explain. SSI has raised $3 billion, which is a lot by any absolute sense. But you could say, "Look at the other companies raising much more." But a lot of their compute goes for inference. These big numbers, these big loans, it's earmarked for inference. That's number one. Number two, if you want to have a product on which you do inference, you need to have a big staff of engineers, salespeople. A lot of the research needs to be dedicated to producing all kinds of product-related features. So then when you look at what's actually left for research, the difference becomes a lot smaller. The other thing is, if you are doing something different, do you really need the absolute maximal scale to prove it? I don't think that's true at all. I think that in our case, we have sufficient compute to prove, to convince ourselves and anyone else, that what we are doing is correct.
**Dwarkesh Patel:** 已经有公开的估计显示,像 OpenAI 这样的公司到目前为止每年仅在实验上就花费大约 50-60 亿美元。这还不包括他们在推理等方面花费的金额。所以看起来他们每年花在运行研究实验上的钱比你们的总融资还要多。
**Dwarkesh Patel:** There have been public estimates that companies like OpenAI spend on the order of $5-6 billion a year just so far, on experiments. This is separate from the amount of money they're spending on inference and so forth. So it seems like they're spending more a year running research experiments than you guys have in total funding.
**Ilya Sutskever:** 我认为这是一个你用它做什么的问题。这是一个你用它做什么的问题。在他们的情况下,在其他公司的情况下,对训练计算的需求要多得多。有更多不同的工作流,有不同的模态,就是有更多的东西。所以它变得碎片化了。
**Ilya Sutskever:** I think it's a question of what you do with it. It's a question of what you do with it. In their case, in the case of others, there is a lot more demand on the training compute. There's a lot more different work streams, there are different modalities, there is just more stuff. So it becomes fragmented.
**Dwarkesh Patel:** SSI 将如何赚钱?
**Dwarkesh Patel:** How will SSI make money?
**Ilya Sutskever:** 我对这个问题的回答大概是这样的。现在,我们只是专注于研究,然后那个问题的答案会自己显现出来。我认为会有很多可能的答案。
**Ilya Sutskever:** My answer to this question is something like this. Right now, we just focus on the research, and then the answer to that question will reveal itself. I think there will be lots of possible answers.
**Dwarkesh Patel:** SSI 的计划仍然是直接冲向超级智能(superintelligence)吗?
**Dwarkesh Patel:** Is SSI's plan still to straight shot superintelligence?
**Ilya Sutskever:** 也许。我认为这有其价值。我认为有很大的价值,因为不被日常的市场竞争影响是非常好的。但我认为有两个原因可能会导致我们改变计划。一个是务实的,如果时间线变得很长的话,它们可能会变长。
第二个,我认为最好的和最强大的 AI 在世界上产生影响是有很大价值的。我认为这是一件有意义的、有价值的事情。
**Ilya Sutskever:** Maybe. I think that there is merit to it. I think there's a lot of merit because it's very nice to not be affected by the day-to-day market competition. But I think there are two reasons that may cause us to change the plan. One is pragmatic, if timelines turned out to be long, which they might. Second, I think there is a lot of value in the best and most powerful AI being out there impacting the world. I think this is a meaningfully valuable thing.
**Dwarkesh Patel:** 那么为什么你的默认计划是直接冲向超级智能呢?因为听起来 OpenAI、Anthropic 和所有这些其他公司的明确想法是,"看,我们有越来越弱的智能,让公众可以逐渐适应和准备。"为什么直接构建超级智能可能更好?
**Dwarkesh Patel:** So then why is your default plan to straight shot superintelligence? Because it sounds like OpenAI, Anthropic, all these other companies, their explicit thinking is, "Look, we have weaker and weaker intelligences that the public can get used to and prepare for." Why is it potentially better to build a superintelligence directly?
**Ilya Sutskever:** 我来阐述支持和反对的理由。支持的理由是,人们在市场上面临的挑战之一是他们必须参与内卷(rat race)。这个内卷相当困难,因为它让你面临必须做出的艰难权衡。能说"我们将自己隔离于这一切,只专注于研究,只有当我们准备好时才出来,在此之前不出来",这是很好的。
但反对意见也是有效的,这些是对立的力量。反对意见是,"嘿,让世界看到强大的 AI 是有用的。让世界看到强大的 AI 是有用的,因为那是你能传达它的唯一方式。"嗯,我猜不只是你能传达这个想法——
**Ilya Sutskever:** I'll make the case for and against. The case for is that one of the challenges that people face when they're in the market is that they have to participate in the rat race. The rat race is quite difficult in that it exposes you to difficult trade-offs which you need to make. It is nice to say, "We'll insulate ourselves from all this and just focus on the research and come out only when we are ready, and not before." But the counterpoint is valid too, and those are opposing forces. The counterpoint is, "Hey, it is useful for the world to see powerful AI. It is useful for the world to see powerful AI because that's the only way you can communicate it." Well, I guess not even just that you can communicate the idea—
**Dwarkesh Patel:** 传达 AI,不是传达想法。传达 AI。
**Dwarkesh Patel:** Communicate the AI, not the idea. Communicate the AI.
**Ilya Sutskever:** 你说"传达 AI"是什么意思?假设你写了一篇关于 AI 的文章,文章说,"AI 将会这样,AI 将会那样,它将会如此。"你读了它然后说,"好吧,这是一篇有趣的文章。"现在假设你看到一个 AI 在做这个,一个 AI 在做那个。这是无法比拟的。
基本上我认为 AI 出现在公众面前是有很大好处的,这将是我们不完全直接冲向超级智能的一个理由。我猜这不仅仅是那个,但我确实认为这是一个重要的部分。
另一件大事是,我想不出人类工程和研究的另一个学科中,最终的产物主要通过"仅仅是思考如何使其安全"就变得更安全的,而不是像——为什么今天每英里的飞机坠毁率比几十年前低了这么多。为什么现在在 Linux 中找到一个 bug 比几十年前要难得多?我认为这主要是因为这些系统被部署到了世界上。你注意到了故障,这些故障被纠正了,系统变得更加鲁棒。
我不确定为什么 AGI 和超人类智能会有什么不同,尤其是考虑到——我希望我们会讨论到这一点——看起来超级智能的危害不仅仅是在那里有一个恶意的回形针最大化器。而是这是一个非常强大的东西,我们甚至不知道如何概念化人们如何与它互动、人们会用它做什么。让人们逐步接触它似乎是一种更好的方式来分散它的影响,帮助人们做好准备。
**Ilya Sutskever:** What do you mean, "communicate the AI"? Let's suppose you write an essay about AI, and the essay says, "AI is going to be this, and AI is going to be that, and it's going to be this." You read it and you say, "Okay, this is an interesting essay." Now suppose you see an AI doing this, an AI doing that. It is incomparable. Basically I think that there is a big benefit from AI being in the public, and that would be a reason for us to not be quite straight shot. I guess it's not even that, but I do think that is an important part of it. The other big thing is that I can't think of another discipline in human engineering and research where the end artifact was made safer mostly through just thinking about how to make it safe, as opposed to, why airplane crashes per mile are so much lower today than they were decades ago. Why is it so much harder to find a bug in Linux than it would have been decades ago? I think it's mostly because these systems were deployed to the world. You noticed failures, those failures were corrected and the systems became more robust. I'm not sure why AGI and superhuman intelligence would be any different, especially given—and I hope we're going to get to this—it seems like the harms of superintelligence are not just about having some malevolent paper clipper out there. But this is a really powerful thing and we don't even know how to conceptualize how people interact with it, what people will do with it. Having gradual access to it seems like a better way to maybe spread out the impact of it and to help people prepare for it.
**Dwarkesh Patel:** 我认为在这一点上,即使是在直接冲向超级智能的场景中,你仍然会进行渐进式发布,我是这么想象的。
**Dwarkesh Patel:** Well I think on this point, even in the straight shot scenario, you would still do a gradual release of it, that's how I would imagine it.
**Ilya Sutskever:** 渐进主义将是任何计划的固有组成部分。只是第一个推出的东西是什么的问题。那是第一点。
第二点,我相信你比其他人更多地倡导了持续学习(continual learning),我实际上认为这是一个重要且正确的事情。原因是这样的。我再给你一个语言如何影响思维的例子。在这个案例中,将是两个已经塑造了每个人思维的词,我坚持这么说。
第一个词:AGI。第二个词:pre-training。
让我解释。AGI 这个术语,为什么这个术语存在?这是一个非常特别的术语。为什么它存在?有一个原因。AGI 这个术语存在的原因,在我看来,不太是因为它是某种智能最终状态的非常重要的、本质性的描述,而是因为它是对一个已经存在的不同术语的反应,那个术语就是"窄人工智能"(narrow AI)。
如果你回到棋类 AI 和计算机游戏 AI 的古代历史,每个人都会说,看看这个窄智能。当然,国际象棋 AI 可以击败 Kasparov,但它做不了其他任何事情。它太窄了,人工窄智能。所以作为回应,作为对此的反应,一些人说,这不好。它太窄了。我们需要的是通用 AI,一个可以做所有事情的 AI。那个术语就获得了很大的关注。
第二个获得很大关注的东西是 pre-training,特别是 pre-training 的配方。我认为人们现在做 RL 的方式可能正在消除 pre-training 的概念印记。但 pre-training 有这个特性。你做更多的 pre-training,模型在所有方面都变得更好,或多或少是均匀的。通用 AI。Pre-training 给你 AGI。
但 AGI 和 pre-training 发生的事情是,在某种意义上它们过度超越了目标。如果你想一想"AGI"这个术语,特别是在 pre-training 的背景下,你会意识到一个人类并不是一个 AGI。是的,确实有一个技能的基础,但一个人类缺少大量的知识。相反,我们依赖持续学习。
所以当你想到,"好吧,让我们假设我们取得了成功,我们产生了某种安全的超级智能。"问题是,你如何定义它?在持续学习的曲线上,它会在哪里?我产生了一个超级聪明的 15 岁少年,非常渴望出发。他们什么都不太知道,是一个很好的学生,非常渴望学习。你去当程序员吧,你去当医生吧,去学习吧。
所以你可以想象,部署本身将涉及某种学习试错的过程。这是一个过程,而不是你丢下一个完成品。
**Ilya Sutskever:** Gradualism would be an inherent component of any plan. It's just a question of what is the first thing that you get out of the door. That's number one. Number two, I believe you have advocated for continual learning more than other people, and I actually think that this is an important and correct thing. Here is why. I'll give you another example of how language affects thinking. In this case, it will be two words that have shaped everyone's thinking, I maintain. First word: AGI. Second word: pre-training. Let me explain. The term AGI, why does this term exist? It's a very particular term. Why does it exist? There's a reason. The reason that the term AGI exists is, in my opinion, not so much because it's a very important, essential descriptor of some end state of intelligence, but because it is a reaction to a different term that existed, and the term is narrow AI. If you go back to ancient history of gameplay and AI, of checkers AI, chess AI, computer games AI, everyone would say, look at this narrow intelligence. Sure, the chess AI can beat Kasparov, but it can't do anything else. It is so narrow, artificial narrow intelligence. So in response, as a reaction to this, some people said, this is not good. It is so narrow. What we need is general AI, an AI that can just do all the things. That term just got a lot of traction. The second thing that got a lot of traction is pre-training, specifically the recipe of pre-training. I think the way people do RL now is maybe undoing the conceptual imprint of pre-training. But pre-training had this property. You do more pre-training and the model gets better at everything, more or less uniformly. General AI. Pre-training gives AGI. But the thing that happened with AGI and pre-training is that in some sense they overshot the target. If you think about the term "AGI", especially in the context of pre-training, you will realize that a human being is not an AGI. Yes, there is definitely a foundation of skills, but a human being lacks a huge amount of knowledge. Instead, we rely on continual learning. So when you think about, "Okay, so let's suppose that we achieve success and we produce some kind of safe superintelligence." The question is, how do you define it? Where on t ...the curve of continual learning is it going to be? I produce a superintelligent 15-year-old that's very eager to go. They don't know very much at all, a great student, very eager. You go and be a programmer, you go and be a doctor, go and learn. So you could imagine that the deployment itself will involve some kind of a learning trial-and-error period. It's a process, as opposed to you dropping the finished thing. I see. You're suggesting that the thing you're pointing out with superintelligence is not some finished mind which knows how to do every single job in the economy. Because the way, say, the original OpenAI charter or whatever defines AGI is like, it can do every single job, every single thing a human can do. You're proposing instead a mind which can learn to do every single job, and that is superintelligence. Yes. But once you have the learning algorithm, it gets deployed into the world the same way a human laborer might join an organization. Exactly. It seems like one of these two things might happen, maybe neither of these happens.
One, this super-efficient learning algorithm becomes superhuman, becomes as good as you and potentially even better, at the task of ML research. As a result the algorithm itself becomes more and more superhuman.
The other is, even if that doesn't happen, if you have a single model—this is explicitly your vision—where instances of a model which are deployed through the economy doing different jobs, learning how to do those jobs, continually learning on the job, picking up all the skills that any human could pick up, but picking them all up at the same time, and then amalgamating their learnings, you basically have a model which functionally becomes superintelligent even without any sort of recursive self-improvement in software. Because you now have one model that can do every single job in the economy and humans can't merge our minds in the same way. So do you expect some sort of intelligence explosion from broad deployment? I think that it is likely that we will have rapid economic growth. I think with broad deployment, there are two arguments you could make which are conflicting. One is that once indeed you get to a point where you have an AI that can learn to do things quickly and you have many of them, then there will be a strong force to deploy them in the economy unless there will be some kind of a regulation that stops it, which by the way there might be. But the idea of very rapid economic growth for some time, I think it's very possible from broad deployment. The question is how rapid it's going to be. I think this is hard to know because on the one hand you have this very efficient worker. On the other hand, the world is just really big and there's a lot of stuff, and that stuff moves at a different speed. But then on the other hand, now the AI could… So I think very rapid economic growth is possible. We will see all kinds of things like different countries with different rules and the ones which have the friendlier rules, the economic growth will be faster. Hard to predict. It seems to me that this is a very precarious situation to be in. In the limit, we know that this should be possible. If you have something that is as good as a human at learning, but which can merge its brains—merge different instances in a way that humans can't merge—already, this seems like a thing that should physically be possible. Humans are possible, digital computers are possible. You just need both of those combined to produce this thing. It also seems this kind of thing is extremely powerful. Economic growth is one way to put it. A Dyson sphere is a lot of economic growth. But another way to put it is that you will have, in potentially a very short period of time... You hire people at SSI, and in six months, they're net productive, probably. A human learns really fast, and this thing is becoming smarter and smarter very fast. How do you think about making that go well? Why is SSI positioned to do that well? What is SSI's plan there, is basically what I'm trying to ask. One of the ways in which my thinking has been changing is that I now place more importance on AI being deployed incrementally and in advance. One very difficult thing about AI is that we are talking about systems that don't yet exist and it's hard to imagine them. I think that one of the things that's happening is that in practice, it's very hard to feel the AGI. It's very hard to feel the AGI. We can talk about it, but imagine having a conversation about how it is like to be old when you're old and frail. You can have a conversation, you can try to imagine it, but it's just hard, and you come back to reality where that's not the case. I think that a lot of the issues around AGI and its future power stem from the fact that it's very difficult to imagine. Future AI is going to be different. It's going to be powerful. Indeed, the whole problem, what is the problem of AI and AGI? The whole problem is the power. When the power is really big, what's going to happen? One of the ways in which I've changed my mind over the past year—and that change of mind, I'll hedge a little bit, may back-propagate into the plans of our company—is that if it's hard to imagine, what do you do? You've got to be showing the thing. I maintain that most people who work on AI also can't imagine it because it's too different from what people see on a day-to-day basis. I do maintain, here's something which I predict will happen. This is a prediction. I maintain that as AI becomes more powerful, people will change their behaviors. We will see all kinds of unprecedented things which are not happening right now. I'll give some examples. I think for better or worse, the frontier companies will play a very important role in what happens, as will the government. The kind of things that I think you'll see, which you see the beginnings of, are companies that are fierce competitors starting to collaborate on AI safety. You may have seen OpenAI and Anthropic doing a first small step, but that did not exist. That's something which I predicted in one of my talks about three years ago, that such a thing will happen. I also maintain that as AI continues to become more powerful, more visibly powerful, there will also be a desire from governments and the public to do something. I think this is a very important force, of showing the AI. That's number one. Number two, okay, so the AI is being built. What needs to be done? One thing that I maintain that will happen is that right now, people who are working on AI, I maintain that the AI doesn't feel powerful because of its mistakes. I do think that at some point the AI will start to feel powerful actually. I think when that happens, we will see a big change in the way all AI companies approach safety. They'll become much more paranoid. I say this as a prediction that we will see happen. We'll see if I'm right. But I think this is something that will happen because they will see the AI becoming more powerful. Everything that's happening right now, I maintain, is because people look at today's AI and it's hard to imagine the future AI. There is a third thing which needs to happen. I'm talking about it in broader terms, not just from the perspective of SSI because you asked me about our company. The question is, what should the companies aspire to build? What should they aspire to build? There has been one big idea that everyone has been locked into, which is the self-improving AI. Why did it happen? Because there are fewer ideas than companies. But I maintain that there is something that's better to build, and I think that everyone will want that. It's the AI that's robustly aligned to care about sentient life specifically. I think in particular, there's a case to be made that it will be easier to build an AI that cares about sentient life than an AI that cares about human life alone, because the AI itself will be sentient. And if you think about things like mirror neurons and human empathy for animals, which you might argue it's not big enough, but it exists. I think it's an emergent property from the fact that we model others with the same circuit that we use to model ourselves, because that's the most efficient thing to do. So even if you got an AI to care about sentient beings—and it's not actually clear to me that that's what you should try to do if you solved alignment—it would still be the case that most sentient beings will be AIs. There will be trillions, eventually quadrillions, of AIs. Humans will be a very small fraction of sentient beings. So it's not clear to me if the goal is some kind of human control over this future civilization, that this is the best criterion. It's true. It's possible it's not the best criterion. I'll say two things.
Number one, care for sentient life, I think there is merit to it. It should be considered.
I think it would be helpful if there was some kind of short list of ideas that the companies, when they are in this situation, could use. That's number two. Number three, I think it would be really materially helpful if the power of the most powerful superintelligence was somehow capped because it would address a lot of these concerns. The question of how to do it, I'm not sure, but I think that would be materially helpful when you're talking about really, really powerful systems. Before we continue the alignment discussion, I want to double-click on that. How much room is there at the top? How do you think about superintelligence? Do you think, using this learning efficiency idea, maybe it is just extremely fast at learning new skills or new knowledge? Does it just have a bigger pool of strategies? Is there a single cohesive "it" in the center that's more powerful or bigger? If so, do you imagine that this will be sort of godlike in comparison to the rest of human civilization, or does it just feel like another agent, or another cluster of agents? This is an area where different people have different intuitions. I think it will be very powerful, for sure. What I think is most likely to happen is that there will be multiple such AIs being created roughly at the same time. I think that if the cluster is big enough—like if the cluster is literally continent-sized—that thing could be really powerful, indeed. If you literally have a continent-sized cluster, those AIs can be very powerful. All I can tell you is that if you're talking about extremely powerful AIs, truly dramatically powerful, it would be nice if they could be restrained in some ways or if there were some kind of agreement or something. What is the concern of superintelligence? What is one way to explain the concern? If you imagine a system that is sufficiently powerful, really sufficiently powerful—and you could say you need to do something sensible like care for sentient life in a very single-minded way—we might not like the results. That's really what it is. Maybe, by the way, the answer is that you do not build an RL agent in the usual sense. I'll point several things out. I think human beings are semi-RL agents. We pursue a reward, and then the emotions or whatever make us tire out of the reward and we pursue a different reward. The market is a very short-sighted kind of agent. Evolution is the same. Evolution is very intelligent in some ways, but very dumb in other ways. The government has been designed to be a never-ending fight between three parts, which has an effect. So I think things like this. Another thing that makes this discussion difficult is that we are talking about systems that don't exist, that we don't know how to build. That's the other thing and that's actually my belief. I think what people are doing right now will go some distance and then peter out. It will continue to improve, but it will also not be "it". The "It" we don't know how to build, and a lot hinges on understanding reliable generalization. I'll say another thing. One of the things that you could say about what causes alignment to be difficult is that your ability to learn human values is fragile. Then your ability to optimize them is fragile. You actually learn to optimize them. And can't you say, "Are these not all instances of unreliable generalization?" Why is it that human beings appear to generalize so much better? What if generalization was much better? What would happen in this case? What would be the effect? But those questions are right now still unanswerable. How does one think about what AI going well looks like? You've scoped out how AI might evolve. We'll have these sort of continual learning agents. AI will be very powerful. Maybe there will be many different AIs. How do you think about lots of continent-sized compute intelligences going around? How dangerous is that? How do we make that less dangerous? And how do we do that in a way that protects an equilibrium where there might be misaligned AIs out there and bad actors out there? Here's one reason why I liked "AI that cares for sentient life". We can debate on whether it's good or bad. But if the first N of these dramatic systems do care for, love, humanity or something, care for sentient life, obviously this also needs to be achieved. This needs to be achieved. So if this is achieved by the first N of those systems, then I can see it go well, at least for quite some time. Then there is the question of what happens in the long run. How do you achieve a long-run equilibrium? I think that there, there is an answer as well. I don't like this answer, but it needs to be considered. In the long run, you might say, "Okay, if you have a world where powerful AIs exist, in the short term, you could say you have universal high income. You have universal high income and we're all doing well." But what do the Buddhists say? "Change is the only constant." Things change. There is some kind of government, political structure thing, and it changes because these things have a shelf life. Some new government thing comes up and it functions, and then after some time it stops functioning. That's something that we see happening all the time. So I think for the long-run equilibrium, one approach is that you could say maybe every person will have an AI that will do their bidding, and that's good. If that could be maintained indefinitely, that's true. But the downside with that is then the AI goes and earns money for the person and advocates for their needs in the political sphere, and maybe then writes a little report saying, "Okay, here's what I've done, here's the situation," and the person says, "Great, keep it up." But the person is no longer a participant. Then you can say that's a precarious place to be in. I'm going to preface by saying I don't like this solution, but it is a solution. The solution is if people become part-AI with some kind of Neuralink++. Because what will happen as a result is that now the AI understands something, and we understand it too, because now the understanding is transmitted wholesale. So now if the AI is in some situation, you are involved in that situation yourself fully. I think this is the answer to the equilibrium. I wonder if the fact that emotions which were developed millions—or in many cases, billions—of years ago in a totally different environment are still guiding our actions so strongly is an example of alignment success.
To spell out what I mean—I don't know whether it's more accurate to call it a value function or reward function—but the brainstem has a directive where it's saying, "Mate with somebody who's more successful." The cortex is the part that understands what success means in the modern context. But the brainstem is able to align the cortex and say, "However you recognize success to be—and I'm not smart enough to understand what that is—you're still going to pursue this directive." I think there's a more general point. I think it's actually really mysterious how evolution encodes high-level desires. It's pretty easy to understand how evolution would endow us with the desire for food that smells good because smell is a chemical, so just pursue that chemical. It's very easy to imagine evolution doing that thing. But evolution also has endowed us with all these social desires. We really care about being seen positively by society. We care about being in good standing. All these social intuitions that we have, I feel strongly that they're baked in. I don't know how evolution did it because it's a high-level concept that's represented in the brain. Let's say you care about some social thing, it's not a low-level signal like smell. It's not something for which there is a sensor. The brain needs to do a lot of processing to piece together lots of bits of information to understand what's going on socially. Somehow evolution said, "That's what you should care about." How did it do it? It did it quickly, too. All these sophisticated social things that we care about, I think they evolved pretty recently. Evolution had an easy time hard-coding this high-level desire. I'm unaware of a good hypothesis for how it's done. I had some ideas I was kicking around, but none of them are satisfying. What's especially impressive is it was desire that you learned in your lifetime, it makes sense because your brain is intelligent. It makes sense why you would be able to learn intelligent desires. Maybe this is not your point, but one way to understand it is that the desire is built into the genome, and the genome is not intelligent. But you're somehow able to describe this feature. It's not even clear how you define that feature, and you can build it into the genes. Essentially, or maybe I'll put it differently. If you think about the tools that are available to the genome, it says, "Okay, here's a recipe for building a brain." You could say, "Here is a recipe for connecting the dopamine neurons to the smell sensor." And if the smell is a certain kind of good smell, you want to eat that. I could imagine the genome doing that. I'm claiming that it is harder to imagine. It's harder to imagine the genome saying you should care about some complicated computation that your entire brain, a big chunk of your brain, does. That's all I'm claiming. I can tell you a speculation of how it could be done. Let me offer a speculation, and I'll explain why the speculation is probably false. So the brain has brain regions. We have our cortex. It has all those brain regions. The cortex is uniform, but the brain regions and the neurons in the cortex kind of speak to their neighbors mostly. That explains why you get brain regions. Because if you want to do some kind of speech processing, all the neurons that do speech need to talk to each other. And because neurons can only speak to their nearby neighbors, for the most part, it has to be a region. All the regions are mostly located in the same place from person to person. So maybe evolution hard-coded literally a location on the brain. So it says, "Oh, when the GPS coordinates of the brain such and such, when that fires, that's what you should care about." Maybe that's what evolution did because that would be within the toolkit of evolution. Yeah, although there are examples where, for example, people who are born blind have that area of their cortex adopted by another sense. I have no idea, but I'd be surprised if the desires or the reward functions which require a visual signal no longer worked for people who have their different areas of their cortex co-opted. For example, if you no longer have vision, can you still feel the sense that I want people around me to like me and so forth, which usually there are also visual cues for. I fully agree with that. I think there's an even stronger counterargument to this theory. There are people who get half of their brains removed in childhood, and they still have all their brain regions. But they all somehow move to just one hemisphere, which suggests that the brain regions, their location is not fixed and so that theory is not true. It would have been cool if it was true, but it's not. So I think that's a mystery. But it's an interesting mystery. The fact is that somehow evolution was able to endow us to care about social stuff very, very reliably. Even people who have all kinds of strange mental conditions and deficiencies and emotional problems tend to care about this also. What is SSI planning on doing differently? Presumably your plan is to be one of the frontier companies when this time arrives. Presumably you started SSI because you're like, "I think I have a way of approaching how to do this safely in a way that the other companies don't." What is that difference? The way I would describe it is that there are some ideas that I think are promising and I want to investigate them and see if they are indeed promising or not. It's really that simple. It's an attempt. If the ideas turn out to be correct—these ideas that we discussed around understanding generalization—then I think we will have something worthy. Will they turn out to be correct? We are doing research. We are squarely an "age of research" company. We are making progress. We've actually made quite good progress over the past year, but we need to keep making more progress, more research. That's how I see it. I see it as an attempt to be a voice and a participant. Your cofounder and previous CEO left to go to Meta recently, and people have asked, "Well, if there were a lot of breakthroughs being made, that seems like a thing that should have been unlikely." I wonder how you respond.
**Dwarkesh Patel:** 我明白了。你在提出的关于超级智能的观点是,它不是某个完成的头脑,知道如何做经济中的每一份工作。因为比如说,OpenAI 的原始章程或其他什么定义 AGI 的方式是——它可以做每一份工作、人类能做的每一件事。你提出的是一个可以学会做每一份工作的头脑,那就是超级智能。
**Dwarkesh Patel:** For this, I will simply remind a few facts that may have been forgotten. I think these facts which provide the context explain the situation.
The context was that we were fundraising at a $32 billion valuation, and then Meta came in and offered to acquire us, and I said no. But my former cofounder in some sense said yes. As a result, he also was able to enjoy a lot of near-term liquidity, and he was the only person from SSI to join Meta.
**Ilya Sutskever:** 是的。但一旦你有了这个学习算法,它就会以人类劳动者加入一个组织的方式被部署到世界中。
**Ilya Sutskever:** It sounds like SSI's plan is to be a company that is at the frontier when you get to this very important period in human history where you have superhuman intelligence. You have these ideas about how to make superhuman intelligence go well. But other companies will be trying their own ideas. What distinguishes SSI's approach to making superintelligence go well?
**Dwarkesh Patel:** 没错。看起来这两件事中的一件可能会发生,也许两件都不会发生。
一,这个超高效的学习算法变得超越人类水平,在 ML 研究这个任务上变得和你一样好甚至更好。结果算法本身变得越来越超越人类。
另一个是,即使那没有发生,如果你有一个单一的模型——这明确是你的愿景——模型的实例被部署到经济的各个角落,做不同的工作,学习如何做那些工作,在工作中持续学习,获得任何人类能获得的所有技能,但同时获得所有技能,然后整合它们的学习成果——你基本上就有了一个在功能上变得超级智能的模型,即使没有任何软件上的递归自我改进。因为你现在有了一个能做经济中每一份工作的模型,而人类无法以同样的方式融合我们的头脑。
那么你期望从广泛部署中产生某种智能爆发吗?
**Dwarkesh Patel:** The main thing that distinguishes SSI is its technical approach. We have a different technical approach that I think is worthy and we are pursuing it.
I maintain that in the end there will be a convergence of strategies. I think there will be a convergence of strategies where at some point, as AI becomes more powerful, it's going to become more or less clearer to everyone what the strategy should be. It should be something like, you need to find some way to talk to each other and you want your first actual real superintelligent AI to be aligned and somehow care for sentient life, care for people, democratic, one of those, some combination thereof. I think this is the condition that everyone should strive for. That's what SSI is striving for. I think that this time, if not already, all the other companies will realize that they're striving towards the same thing. We'll see. I think that the world will truly change as AI becomes more powerful. I think things will be really different and people will be acting really differently.
**Ilya Sutskever:** 我认为我们很可能会有快速的经济增长。我认为对于广泛部署,你可以提出两个相互矛盾的论点。
一个是,一旦你确实达到了一个点——你有一个 AI 可以快速学会做事情,而且你有很多这样的 AI——那么就会有强大的力量推动将它们部署到经济中,除非会有某种监管来阻止它,顺便说一下可能会有。但我认为在某段时间内非常快速的经济增长,从广泛部署的角度来看是非常可能的。
问题是它会有多快。我认为这很难知道,因为一方面你有这个非常高效的工人。另一方面,世界实在太大了,有太多东西,那些东西以不同的速度运转。但另一方面,现在 AI 可以……
所以我认为非常快速的经济增长是可能的。我们将看到各种事情,比如不同国家有不同的规则,那些有更友好规则的国家,经济增长将会更快。很难预测。
**Ilya Sutskever:** Speaking of forecasts, what are your forecasts to this system you're describing, which can learn as well as a human and subsequently, as a result, become superhuman?
**Dwarkesh Patel:** 对我来说,这似乎是一个非常危险的处境。在极限情况下,我们知道这应该是可能的。如果你有一个和人类一样擅长学习的东西,但它可以融合它的大脑——以人类无法融合的方式融合不同的实例——这已经看起来是一个物理上应该可能的事情。人类是可能的,数字计算机是可能的。你只需要把这两者结合起来就能产生这个东西。
这种东西看起来也是极其强大的。经济增长是一种说法。Dyson sphere 是很多经济增长。但另一种说法是,在潜在的非常短的时间内……你在 SSI 雇佣人,六个月后他们就有净产出了,大概是这样。一个人学习真的很快,而这个东西变得越来越聪明,非常快。
你如何思考让这件事顺利进行?为什么 SSI 有能力把这件事做好?SSI 在这方面的计划是什么,基本上就是我想问的。
**Dwarkesh Patel:** I think like 5 to 20.
**Ilya Sutskever:** 我的思考正在改变的一个方面是,我现在更加重视 AI 被渐进地、提前地部署。AI 最困难的事情之一是,我们在谈论尚不存在的系统,很难想象它们。
我认为正在发生的事情之一是,在实践中,很难感受到 AGI。很难感受到 AGI。我们可以谈论它,但想象一下当你年老体弱时讨论变老是什么感觉。你可以进行对话,你可以尝试想象,但这就是困难的,然后你回到现实中——那并非如此。
我认为围绕 AGI 及其未来力量的很多问题都源于这个事实——很难想象。未来的 AI 将会不同。它将会强大。确实,整个问题是什么——AI 和 AGI 的问题是什么?整个问题就是力量。当力量真的很大时,会发生什么?
在过去一年中我改变想法的一种方式——那个想法的改变,我会稍微保留一下,可能会反向传播到我们公司的计划中——是,如果很难想象,你该怎么办?你必须展示那个东西。
我坚持认为,大多数从事 AI 工作的人也无法想象它,因为它与人们在日常基础上看到的太不同了。
这是我预测将会发生的事情。这是一个预测。我坚持认为,随着 AI 变得更强大,人们将改变他们的行为。我们将看到各种前所未有的事情,这些事情现在还没有发生。
我举一些例子。我认为不管好坏,前沿公司将在接下来发生的事情中扮演非常重要的角色,政府也一样。我认为你会看到的那种事情——你已经看到了开端——是那些激烈的竞争对手开始在 AI 安全方面合作。你可能已经看到 OpenAI 和 Anthropic 迈出了第一小步,但那以前是不存在的。那是我大约三年前在一次演讲中预测会发生的事情。
我还坚持认为,随着 AI 继续变得更强大、更明显地强大,政府和公众也会有做点什么的愿望。
我认为这是一股非常重要的力量——展示 AI。那是第一点。
第二点,好吧,所以 AI 正在被构建。需要做什么?我坚持认为将会发生的一件事是,现在从事 AI 工作的人——我坚持认为 AI 并不感觉强大,因为它的错误。我确实认为在某个时候 AI 将开始实际上感觉强大。我认为当那发生时,我们将看到所有 AI 公司在安全方面的做法发生巨大变化。他们会变得更加偏执。我作为一个预测来说这个——我们将会看到这发生。我们看看我是否正确。但我认为这将会发生,因为他们会看到 AI 变得更强大。
现在正在发生的一切,我坚持认为,是因为人们看着今天的 AI,很难想象未来的 AI。
还有第三件需要发生的事情。我在更广泛的层面上谈论这个,不仅仅是从 SSI 的角度,因为你问了我关于我们公司的问题。问题是,公司应该致力于构建什么?他们应该致力于构建什么?
有一个大家一直被锁定的大想法,那就是自我改进的 AI。为什么会这样?因为想法比公司少。但我坚持认为,有一些更好的东西可以构建,我认为每个人都会想要那个。那就是鲁棒地对齐到关心有意识生命的 AI,具体来说。
我特别认为,有理由相信构建一个关心有意识生命的 AI 比构建一个只关心人类生命的 AI 更容易,因为 AI 本身也将是有意识的。如果你想想镜像神经元和人类对动物的同理心——你可能会说它不够大,但它存在。我认为这是一个涌现特性,来自于我们用同一个回路来模拟他人和模拟自己,因为这是最高效的做法。
**Ilya Sutskever:** 5 to 20 years?
**Dwarkesh Patel:** 所以即使你让一个 AI 关心有意识的存在——而且对我来说,这是否应该是你在解决了对齐问题后应该尝试做的事情,这一点实际上并不清楚——情况仍然是,大多数有意识的存在将是 AI。将会有数万亿、最终数千万亿个 AI。人类将只是有意识存在中非常小的一部分。所以对我来说,如果目标是某种人类对这个未来文明的控制,这是否是最好的标准,这一点并不清楚。
**Dwarkesh Patel:** Mhm.
**Ilya Sutskever:** 这是对的。它可能不是最好的标准。我会说两件事。
第一,关心有意识的生命,我认为这有其价值。它应该被考虑。
我认为如果有某种简短的想法清单——当公司处于这种情况时可以使用的——会很有帮助。那是第二点。
第三,我认为如果最强大的超级智能的力量以某种方式被限制,这将是实质性的帮助,因为它将解决很多这些担忧。如何做到这一点,我不确定,但我认为当你谈论真正、真正强大的系统时,那将是实质性的帮助。
**Ilya Sutskever:** I just want to unroll how you might see the world coming. It's like, we have a couple more years where these other companies are continuing the current approach and it stalls out. "Stalls out" here meaning they earn no more than low hundreds of billions in revenue? How do you think about what stalling out means?
**Dwarkesh Patel:** 在我们继续对齐讨论之前,我想深入探讨一下那个。顶部有多少空间?你如何思考超级智能?你认为,用这个学习效率的想法,也许它只是在学习新技能或新知识方面极其快速?它只是有一个更大的策略池?中心是否有一个单一的、连贯的"它"更强大或更大?如果是这样,你想象这将与人类文明的其他部分相比像神一样,还是只是感觉像另一个 agent,或另一组 agent?
**Dwarkesh Patel:** I think stalling out will look like…it will all look very similar among all the different companies. It could be something like this. I'm not sure because I think even with stalling out, I think these companies could make a stupendous revenue. Maybe not profits because they will need to work hard to differentiate each other from themselves, but revenue definitely.
**Ilya Sutskever:** 这是一个不同的人有不同直觉的领域。我认为它将非常强大,这是确定的。我认为最可能发生的是,将有多个这样的 AI 大致同时被创造出来。
我认为如果集群足够大——比如如果集群真的是大陆级别的——那个东西可以真的很强大。如果你真的有一个大陆大小的集群,那些 AI 可以非常强大。
我能告诉你的是,如果你在谈论极其强大的 AI,真正极其强大的,如果它们能够以某种方式被约束,或者如果有某种协议或其他什么,那将是好的。
**Ilya Sutskever:** But something in your model implies that when the correct solution does emerge, there will be convergence between all the companies. I'm curious why you think that's the case.
**Dwarkesh Patel:** 超级智能的担忧是什么?有什么方式来解释这种担忧?
**Dwarkesh Patel:** I was talking more about convergence on their alignment strategies. I think eventual convergence on the technical approach is probably going to happen as well, but I was alluding to convergence to the alignment strategies. What exactly is the thing that should be done.
**Ilya Sutskever:** 如果你想象一个足够强大的系统,真正足够强大的——你可能会说你需要做一些理智的事情,比如以非常专注的方式关心有意识的生命——我们可能不会喜欢结果。这真的就是问题所在。
也许,顺便说一下,答案是你不要按照通常的意义构建一个 RL agent。我指出几点。我认为人类是半 RL agent。我们追求一个奖励,然后情绪或其他什么使我们厌倦了那个奖励,我们就追求一个不同的奖励。
市场是一种非常短视的 agent。进化也一样。进化在某些方面非常聪明,但在其他方面非常愚蠢。政府被设计成三个部分之间永无止境的斗争,这产生了某种效果。所以我认为类似这样的事情。
另一件使这个讨论困难的事情是,我们在谈论不存在的、我们不知道如何构建的系统。那是另一件事,那实际上是我的信念。我认为人们现在做的事情会走一段距离然后逐渐减弱。它将继续改进,但它也不会是"它"。那个"它"我们不知道如何构建,而很多事情取决于理解可靠的泛化。
我再说一件事。你可以说导致对齐困难的一件事是,你学习人类价值观的能力是脆弱的。然后你优化它们的能力也是脆弱的。你实际上学会了优化它们。难道你不能说,"这些不都是不可靠泛化的例子吗?"为什么人类看起来泛化得好得多?如果泛化好得多会怎样?在这种情况下会有什么效果?
但这些问题目前仍然无法回答。
**Ilya Sutskever:** I just want to better understand how you see the future unrolling. Currently, we have these different companies, and you expect their approach to continue generating revenue but not get to this human-like learner. So now we have these different forks of companies. We have you, we have Thinking Machines, there's a bunch of other labs. Maybe one of them figures out the correct approach. But then the release of their product makes it clear to other people how to do this thing.
**Dwarkesh Patel:** 一个人如何思考 AI 顺利进行是什么样子的?你已经勾勒了 AI 可能如何演变。我们将有这些持续学习的 agent。AI 将非常强大。也许会有许多不同的 AI。你如何看待许多大陆级计算智能到处运行的情况?那有多危险?我们如何使它不那么危险?我们如何以一种保护均衡的方式来做到这一点——在那个均衡中可能有未对齐的 AI 和恶意行为者?
**Dwarkesh Patel:** I think it won't be clear how to do it, but it will be clear that something different is possible, and that is information. People will then be trying to figure out how that works.
I do think though that one of the things not addressed here, not discussed, is that with each increase in the AI's capabilities, I think there will be some kind of changes, but I don't know exactly which ones, in how things are being done. I think it's going to be important, yet I can't spell out what that is exactly.
**Ilya Sutskever:** 这是我喜欢"关心有意识生命的 AI"的一个原因。我们可以辩论它是好是坏。但如果前 N 个这些戏剧性的系统确实关心、热爱人类或其他什么、关心有意识的生命——显然这也需要被实现。这需要被实现。
所以如果这被前 N 个系统实现了,那么我可以看到它至少在相当长的一段时间内顺利进行。
然后是长期会发生什么的问题。如何实现长期均衡?我认为那里也有一个答案。我不喜欢这个答案,但它需要被考虑。
长期来看,你可能会说,"好吧,如果你有一个强大的 AI 存在的世界,短期来说,你可以说你有普遍的高收入。你有普遍的高收入,我们都过得很好。"但佛教徒怎么说?"变化是唯一的常量。"事情会变。有某种政府、政治结构的东西,它会变,因为这些东西有保质期。某种新的政府的东西出现了,它运作了,然后过了一段时间它就不运作了。那是我们一直在看到的事情。
所以我认为对于长期均衡,一种方法是你可以说,也许每个人都会有一个 AI 为他们服务,那是好的。如果那能被无限期地维持,那是对的。但这样做的缺点是,AI 去为那个人赚钱,在政治领域为他们的需求倡导,也许然后写一份小报告说,"好的,这是我做的事情,这是情况,"然后那个人说,"很好,继续。"但那个人不再是一个参与者了。
那么你可以说那是一个不稳定的处境。
我要先说我不喜欢这个解决方案,但它是一个解决方案。解决方案是如果人们通过某种 Neuralink++ 变成半 AI。因为这样做的结果是,现在 AI 理解了某些东西,我们也理解了,因为现在那种理解被整体传输了。所以现在如果 AI 处于某种情境中,你自己也完全参与在那个情境中。
我认为这就是均衡的答案。
**Ilya Sutskever:** By default, you would expect the company that has that model to be getting all these gains because they have the model that has the skills and knowledge that it's building up in the world. What is the reason to think that the benefits of that would be widely distributed and not just end up at whatever model company gets this continuous learning loop going first?
**Dwarkesh Patel:** 我想知道,数百万年前——或者在很多情况下,数十亿年前——在一个完全不同的环境中发展起来的情绪仍然如此强烈地指导我们的行为,这是否是对齐成功的一个例子。
具体来说我的意思是——我不知道叫它 value function 还是 reward function 更准确——但脑干有一个指令说,"与更成功的人交配。"大脑皮层是理解在现代语境下成功意味着什么的部分。但脑干能够对齐大脑皮层并说,"无论你如何认识到成功是什么——而我不够聪明,无法理解那是什么——你仍然会追求这个指令。"
**Dwarkesh Patel:** Here is what I think is going to happen. Number one, let's look at how things have gone so far with the AIs of the past. One company produced an advance and the other company scrambled and produced some similar things after some amount of time and they started to compete in the market and push the prices down. So I think from the market perspective, something similar will happen there as well. We are talking about the good world, by the way.
**Ilya Sutskever:** 我认为有一个更一般的观点。我认为进化如何编码高层次的欲望实际上真的很神秘。理解进化如何赋予我们对闻起来好的食物的欲望是很容易的,因为气味是一种化学物质,所以就追求那种化学物质。很容易想象进化做那种事情。
但进化也赋予了我们所有这些社交欲望。我们真的很在意被社会正面看待。我们在意处于良好的社会地位。所有这些我们拥有的社交直觉,我强烈地感觉它们是内置的。
我不知道进化是如何做到的,因为这是一个在大脑中表示的高层次概念。假设你关心某个社交方面的事情,它不是像气味那样的低层次信号。它不是有传感器的东西。大脑需要做大量的处理来拼凑许多信息碎片,以理解社交方面正在发生什么。
不知怎的,进化说,"那就是你应该关心的。"它是怎么做到的?
而且它做得很快。所有这些我们关心的复杂社交事物,我认为它们是相当近期才进化出来的。进化竟然能轻松地硬编码这种高层次的欲望。我不知道有什么好的假说来解释它是如何做到的。
我有一些我在琢磨的想法,但没有一个是令人满意的。
**Ilya Sutskever:** What's the good world?
**Dwarkesh Patel:** 特别令人印象深刻的是,如果是你在一生中学到的欲望,那说得通,因为你的大脑是智能的。你能够学到智能的欲望是说得通的。也许这不是你的观点,但一种理解方式是,欲望是内置在基因组中的,而基因组不是智能的。但你不知怎的能够描述这个特征。甚至不清楚你如何定义那个特征,而你可以把它构建到基因中。
**Dwarkesh Patel:** It's where we have these powerful human-like learners that are also… By the way, maybe there's another thing we haven't discussed on the spec of the superintelligent AI that I think is worth considering. It's that you make it narrow, it can be useful and narrow at the same time. You can have lots of narrow superintelligent AIs. But suppose you have many of them and you have some company that's producing a lot of profits from it. Then you have another company that comes in and starts to compete. The way the competition is going to work is through specialization. Competition loves specialization. You see it in the market, you see it in evolution as well. You're going to have lots of different niches and you're going to have lots of different companies who are occupying different niches. In this world we might say one AI company is really quite a bit better at some area of really complicated economic activity and a different company is better at another area. And the third company is really good at litigation.
**Ilya Sutskever:** 本质上是这样,或者也许我换个方式说。如果你想想基因组可用的工具,它说,"好的,这是构建大脑的配方。"你可以说,"这是将多巴胺神经元连接到气味传感器的配方。"如果气味是某种好闻的气味,你就想吃那个。我可以想象基因组做到那个。
我要说的是,更难想象的是——更难想象基因组说你应该关心某种复杂的计算,那种你整个大脑、大脑的一大块都在进行的计算。那就是我要说的全部。
我可以告诉你一个关于它可能如何做到的推测。让我提出一个推测,然后我会解释为什么这个推测可能是错的。
大脑有脑区。我们有大脑皮层。它有所有那些脑区。大脑皮层是均匀的,但脑区和大脑皮层中的神经元主要和它们的邻居说话。这就解释了为什么你有脑区。因为如果你想做某种语言处理,所有做语言处理的神经元需要互相交流。因为神经元大部分只能和附近的邻居说话,所以它必须是一个区域。
所有的区域在不同人身上大致位于相同的位置。所以也许进化实际上硬编码了大脑上的一个位置。所以它说,"哦,当大脑的 GPS 坐标某某某、当那里激活时,那就是你应该关心的。"也许这就是进化所做的,因为那将在进化的工具包之内。
**Ilya Sutskever:** …from person to person. So maybe evolution hard-coded literally a location on the brain. So it says, "Oh, when the GPS coordinates of the brain such and such, when that fires, that's what you should care about." Maybe that's what evolution did because that would be within the toolkit of evolution.
**Dwarkesh Patel:** 是的,虽然有一些例子,比如天生失明的人,他们大脑皮层的那个区域被另一种感官接管了。我不知道,但如果那些需要视觉信号的欲望或奖励函数对那些大脑皮层不同区域被征用的人不再起作用,我会感到惊讶。
例如,如果你不再有视觉,你还能感受到"我希望周围的人喜欢我"这种感觉吗?等等,通常这也有视觉线索。
**Dwarkesh Patel:** Yeah, although there are examples where, for example, people who are born blind have that area of their cortex adopted by another sense. I have no idea, but I'd be surprised if the desires or the reward functions which require a visual signal no longer worked for people who have their different areas of their cortex co-opted. For example, if you no longer have vision, can you still feel the sense that I want people around me to like me and so forth, which usually there are also visual cues for.
**Ilya Sutskever:** 我完全同意那一点。我认为对这个理论有一个更强的反驳论据。有些人在童年时期切除了半个大脑,他们仍然拥有所有的脑区。但它们不知怎的都移到了只有一个半球,这表明脑区的位置不是固定的,所以那个理论不是对的。如果它是对的会很酷,但它不是。
所以我认为那是一个谜。但这是一个有趣的谜。事实是,不知怎的,进化能够让我们非常、非常可靠地关心社交方面的事物。即使是那些有各种奇怪的精神状况和缺陷以及情感问题的人,也倾向于关心这些。
**Ilya Sutskever:** I fully agree with that. I think there's an even stronger counterargument to this theory. There are people who get half of their brains removed in childhood, and they still have all their brain regions. But they all somehow move to just one hemisphere, which suggests that the brain regions, their location is not fixed and so that theory is not true. It would have been cool if it was true, but it's not. So I think that's a mystery. But it's an interesting mystery. The fact is that somehow evolution was able to endow us to care about social stuff very, very reliably. Even people who have all kinds of strange mental conditions and deficiencies and emotional problems tend to care about this also.
**Dwarkesh Patel:** SSI 计划做什么不同的事情?大概你的计划是在这个时刻到来时成为前沿公司之一。大概你创办 SSI 是因为你觉得,"我认为我有一种安全地做这件事的方法,而其他公司没有。"那个区别是什么?
**Dwarkesh Patel:** What is SSI planning on doing differently? Presumably your plan is to be one of the frontier companies when this time arrives. Presumably you started SSI because you're like, "I think I have a way of approaching how to do this safely in a way that the other companies don't." What is that difference?
**Ilya Sutskever:** 我会这样描述:有一些我认为很有前途的想法,我想要研究它们,看看它们是否确实有前途。就这么简单。这是一个尝试。如果这些想法被证明是正确的——我们讨论过的那些关于理解泛化的想法——那么我认为我们将拥有有价值的东西。
它们会被证明是正确的吗?我们正在做研究。我们完全是一家"研究时代"的公司。我们在取得进展。我们实际上在过去一年中取得了相当好的进展,但我们需要继续取得更多进展,更多研究。那就是我的看法。我把它看作是一个尝试——成为一个声音和一个参与者。
**Ilya Sutskever:** The way I would describe it is that there are some ideas that I think are promising and I want to investigate them and see if they are indeed promising or not. It's really that simple. It's an attempt. If the ideas turn out to be correct—these ideas that we discussed around understanding generalization—then I think we will have something worthy. Will they turn out to be correct? We are doing research. We are squarely an "age of research" company. We are making progress. We've actually made quite good progress over the past year, but we need to keep making more progress, more research. That's how I see it. I see it as an attempt to be a voice and a participant.
**Dwarkesh Patel:** 你的联合创始人和前 CEO 最近离开去了 Meta,人们问过,"嗯,如果正在取得很多突破,那这件事发生的概率应该很低。"我想知道你怎么回应。
**Dwarkesh Patel:** Your cofounder and previous CEO left to go to Meta recently, and people have asked, "Well, if there were a lot of breakthroughs being made, that seems like a thing that should have been unlikely." I wonder how you respond.
**Ilya Sutskever:** 对此,我会简单地提醒一些可能已经被遗忘的事实。我认为这些提供背景的事实解释了这个情况。
背景是我们当时正在以 320 亿美元的估值融资,然后 Meta 来了并提出收购我们,我说不。但我的前联合创始人在某种意义上说了是。结果,他也能享受到大量的短期流动性,而他是唯一一个从 SSI 加入 Meta 的人。
**Ilya Sutskever:** For this, I will simply remind a few facts that may have been forgotten. I think these facts which provide the context explain the situation. The context was that we were fundraising at a $32 billion valuation, and then Meta came in and offered to acquire us, and I said no. But my former cofounder in some sense said yes. As a result, he also was able to enjoy a lot of near-term liquidity, and he was the only person from SSI to join Meta.
**Dwarkesh Patel:** 听起来 SSI 的计划是成为一家在人类历史上这个非常重要的时期——当你拥有超人类智能的时候——处于前沿的公司。你有关于如何让超人类智能顺利进行的想法。但其他公司也会尝试他们自己的想法。是什么使 SSI 的方法——让超级智能顺利进行的方法——与众不同?
**Dwarkesh Patel:** It sounds like SSI's plan is to be a company that is at the frontier when you get to this very important period in human history where you have superhuman intelligence. You have these ideas about how to make superhuman intelligence go well. But other companies will be trying their own ideas. What distinguishes SSI's approach to making superintelligence go well?
**Ilya Sutskever:** 使 SSI 与众不同的主要东西是它的技术路径。我们有一种不同的技术路径,我认为它是有价值的,我们正在追求它。
我坚持认为最终会有策略的趋同。我认为会有策略的趋同,在某个时候,随着 AI 变得更强大,对每个人来说什么应该是策略将或多或少变得清楚。应该是类似于——你需要找到某种方式互相交流,你希望你的第一个真正的超级智能 AI 是对齐的,并且以某种方式关心有意识的生命、关心人类、民主的,其中某个、某种组合。
我认为这是每个人都应该为之努力的条件。那就是 SSI 正在为之努力的。我认为到了这个时候——如果不是已经的话——所有其他公司都会意识到他们也在朝着同样的方向努力。
我们拭目以待。我认为世界将真正改变,随着 AI 变得更强大。我认为事情将真正不同,人们的行为将真正不同。
**Ilya Sutskever:** The main thing that distinguishes SSI is its technical approach. We have a different technical approach that I think is worthy and we are pursuing it. I maintain that in the end there will be a convergence of strategies. I think there will be a convergence of strategies where at some point, as AI becomes more powerful, it's going to become more or less clearer to everyone what the strategy should be. It should be something like, you need to find some way to talk to each other and you want your first actual real superintelligent AI to be aligned and somehow care for sentient life, care for people, democratic, one of those, some combination thereof. I think this is the condition that everyone should strive for. That's what SSI is striving for. I think that this time, if not already, all the other companies will realize that they're striving towards the same thing. We'll see. I think that the world will truly change as AI becomes more powerful. I think things will be really different and people will be acting really differently.
**Dwarkesh Patel:** 说到预测,你对你描述的这个系统——可以像人类一样学习并因此变得超越人类的系统——的预测是什么?
**Dwarkesh Patel:** Speaking of forecasts, what are your forecasts to this system you're describing, which can learn as well as a human and subsequently, as a result, become superhuman?
**Ilya Sutskever:** 我认为大概 5 到 20 年。
**Ilya Sutskever:** I think like 5 to 20.
**Dwarkesh Patel:** 5 到 20 年?
**Dwarkesh Patel:** 5 to 20 years?
**Ilya Sutskever:** 嗯嗯。
**Ilya Sutskever:** Mhm.
**Dwarkesh Patel:** 我只是想展开看看你可能如何看待世界的发展。就是说,我们还有几年时间,其他公司在继续当前的方法,然后它停滞了。这里的"停滞"意味着它们的收入不超过低千亿美元?你如何思考停滞意味着什么?
**Dwarkesh Patel:** I just want to unroll how you might see the world coming. It's like, we have a couple more years where these other companies are continuing the current approach and it stalls out. "Stalls out" here meaning they earn no more than low hundreds of billions in revenue? How do you think about what stalling out means?
**Ilya Sutskever:** 我认为停滞看起来会……在所有不同的公司之间看起来都非常相似。可能会是这样。我不确定,因为我认为即使停滞了,我认为这些公司也能获得惊人的收入。也许不是利润,因为他们需要努力让自己与彼此差异化,但收入肯定可以。
**Ilya Sutskever:** I think stalling out will look like…it will all look very similar among all the different companies. It could be something like this. I'm not sure because I think even with stalling out, I think these companies could make a stupendous revenue. Maybe not profits because they will need to work hard to differentiate each other from themselves, but revenue definitely.
**Dwarkesh Patel:** 但你的模型中有某些东西暗示,当正确的解决方案确实出现时,所有公司之间会趋同。我很好奇你为什么认为是这样。
**Dwarkesh Patel:** But something in your model implies that when the correct solution does emerge, there will be convergence between all the companies. I'm curious why you think that's the case.
**Ilya Sutskever:** 我更多是在说他们对齐策略的趋同。我认为技术路径的最终趋同可能也会发生,但我是在暗示对齐策略的趋同。应该做什么确切的事情。
**Ilya Sutskever:** I was talking more about convergence on their alignment strategies. I think eventual convergence on the technical approach is probably going to happen as well, but I was alluding to convergence to the alignment strategies. What exactly is the thing that should be done.
**Dwarkesh Patel:** 我只是想更好地理解你如何看待未来的展开。目前,我们有这些不同的公司,你预期他们的方法会继续产生收入但不会达到这种类人学习者的水平。所以现在我们有这些不同的公司分支。我们有你,我们有 Thinking Machines,还有一批其他的实验室。也许其中一个找到了正确的方法。但他们的产品发布会让其他人清楚如何做这件事。
**Dwarkesh Patel:** I just want to better understand how you see the future unrolling. Currently, we have these different companies, and you expect their approach to continue generating revenue but not get to this human-like learner. So now we have these different forks of companies. We have you, we have Thinking Machines, there's a bunch of other labs. Maybe one of them figures out the correct approach. But then the release of their product makes it clear to other people how to do this thing.
**Ilya Sutskever:** 我认为不会清楚如何做到,但会清楚有一些不同的东西是可能的,而那就是信息。人们然后会试图弄清楚那是怎么运作的。
我确实认为,这里没有提到的、没有讨论的一件事是,随着 AI 能力的每一次提升,我认为事情的做法会有某种变化,但我不知道确切是哪些变化。我认为这将很重要,但我无法准确说明那是什么。
**Ilya Sutskever:** I think it won't be clear how to do it, but it will be clear that something different is possible, and that is information. People will then be trying to figure out how that works. I do think though that one of the things not addressed here, not discussed, is that with each increase in the AI's capabilities, I think there will be some kind of changes, but I don't know exactly which ones, in how things are being done. I think it's going to be important, yet I can't spell out what that is exactly.
**Dwarkesh Patel:** 默认情况下,你会期望拥有那个模型的公司获得所有这些收益,因为他们有那个拥有在世界中积累的技能和知识的模型。有什么理由认为这些收益会被广泛分配,而不是仅仅落到无论哪个模型公司首先启动了这个持续学习循环上?
**Dwarkesh Patel:** By default, you would expect the company that has that model to be getting all these gains because they have the model that has the skills and knowledge that it's building up in the world. What is the reason to think that the benefits of that would be widely distributed and not just end up at whatever model company gets this continuous learning loop going first?
**Ilya Sutskever:** 这是我认为将会发生的。第一,让我们看看到目前为止过去的 AI 是怎么发展的。一家公司产生了一个进步,另一家公司奋力追赶并在一段时间后产生了一些类似的东西,然后他们开始在市场上竞争并压低价格。所以我认为从市场的角度来看,类似的事情也会在那里发生。
我们在谈论好的世界,顺便说一下。
**Ilya Sutskever:** Here is what I think is going to happen. Number one, let's look at how things have gone so far with the AIs of the past. One company produced an advance and the other company scrambled and produced some similar things after some amount of time and they started to compete in the market and push the prices down. So I think from the market perspective, something similar will happen there as well. We are talking about the good world, by the way.
**Dwarkesh Patel:** 什么是好的世界?
**Dwarkesh Patel:** What's the good world?
**Ilya Sutskever:** 就是我们拥有这些强大的类人学习者,它们也是……顺便说一下,也许还有另一件我们没有讨论过的事情,关于超级智能 AI 的规格,我认为值得考虑。那就是你让它变得窄,它可以同时是有用的和窄的。你可以拥有很多窄的超级智能 AI。
但假设你有很多这样的 AI,你有某家公司从中获得大量利润。然后另一家公司进来开始竞争。竞争将通过专业化来运作。竞争喜欢专业化。你在市场上看到它,在进化中也看到它。你将有很多不同的生态位,你将有很多不同的公司占据不同的生态位。
在这个世界里,我们可能会说一家 AI 公司在某个非常复杂的经济活动领域确实比其他公司好很多,另一家公司在另一个领域更好。第三家公司在诉讼方面真的很好。
**Ilya Sutskever:** It's where we have these powerful human-like learners that are also… By the way, maybe there's another thing we haven't discussed on the spec of the superintelligent AI that I think is worth considering. It's that you make it narrow, it can be useful and narrow at the same time. You can have lots of narrow superintelligent AIs. But suppose you have many of them and you have some company that's producing a lot of profits from it. Then you have another company that comes in and starts to compete. The way the competition is going to work is through specialization. Competition loves specialization. You see it in the market, you see it in evolution as well. You're going to have lots of different niches and you're going to have lots of different companies who are occupying different niches. In this world we might say one AI company is really quite a bit better at some area of really complicated economic activity and a different company is better at another area. And the third company is really good at litigation.
**Dwarkesh Patel:** 这不是与类人学习所暗示的矛盾吗?就是说它可以学习……
**Dwarkesh Patel:** Isn't this contradicted by what human-like learning implies? It's that it can learn…
**Ilya Sutskever:** 它可以,但你有积累的学习。你有大量的投资。你花了大量的计算和经验来变得真正、真正出色、真正非凡地擅长这件事。别人花了大量的计算和经验来变得非常擅长另一件事。你应用了大量的人类式学习来达到那里,但现在你处于这个高点,其他人会说,"看,我不想从头开始学你学过的东西。"
**Ilya Sutskever:** It can, but you have accumulated learning. You have a big investment. You spent a lot of compute to become really, really good, really phenomenal at this thing. Someone else spent a huge amount of compute and a huge amount of experience to get really good at some other thing. You apply a lot of human learning to get there, but now you are at this high point where someone else would say, "Look, I don't want to start learning what you've learned."
**Dwarkesh Patel:** 我猜那需要很多不同的公司同时开始在类人持续学习 agent 上工作,这样他们可以在不同的分支上开始不同的树搜索。但如果一家公司首先得到那个 agent,或首先得到那个学习者,那么确实看起来……
嗯,如果你想想经济中的每一份工作,让一个实例学习每一份工作,对一家公司来说似乎是可行的。
**Dwarkesh Patel:** I guess that would require many different companies to begin at the human-like continual learning agent at the same time so that they can start their different tree search in different branches. But if one company gets that agent first, or gets that learner first, it does then seem like… Well, if you just think about every single job in the economy, having an instance learning each one seems tractable for a company.
**Ilya Sutskever:** 那是一个有效的论点。我的强烈直觉是事情不会那样发展。论点说会那样发展,但我的强烈直觉是不会。理论上,理论和实践没有区别。在实践中,有区别。我认为这将是其中一种情况。
**Ilya Sutskever:** That's a valid argument. My strong intuition is that it's not how it's going to go. The argument says it will go this way, but my strong intuition is that it will not go this way. In theory, there is no difference between theory and practice. In practice, there is. I think that's going to be one of those.
**Dwarkesh Patel:** 很多人关于递归自我改进的模型确实明确地说,我们将在一个服务器里有一百万个 Ilya,他们在想出不同的想法,这将导致超级智能非常快速地出现。你对你正在做的事情有多大的可并行性有什么直觉?复制 Ilya 有什么收益?
**Dwarkesh Patel:** A lot of people's models of recursive self-improvement literally, explicitly state we will have a million Ilyas in a server that are coming up with different ideas, and this will lead to a superintelligence emerging very fast. Do you have some intuition about how parallelizable the thing you are doing is? What are the gains from making copies of Ilya?
**Ilya Sutskever:** 我不知道。我认为肯定会有递减的回报,因为你想要的是思维方式不同的人,而不是相同的人。如果真的有我的复制品,我不确定你会得到多少额外的增量价值。思维方式不同的人,那才是你想要的。
**Ilya Sutskever:** I don't know. I think there'll definitely be diminishing returns because you want people who think differently rather than the same. If there were literal copies of me, I'm not sure how much more incremental value you'd get. People who think differently, that's what you want.
**Dwarkesh Patel:** 为什么即使你看不同的模型,甚至是由完全不同的公司发布的、在可能不重叠的数据集上训练的模型,LLM 彼此之间实际上有多相似,这件事真的很疯狂?
**Dwarkesh Patel:** Why is it that if you look at different models, even released by totally different companies trained on potentially non-overlapping datasets, it's actually crazy how similar LLMs are to each other?
**Ilya Sutskever:** 也许数据集并不像看起来那样不重叠。但有某种感觉,即使一个个体人类可能不如未来的 AI 有生产力,但也许人类团队比 AI 团队拥有更多多样性,这一事实有其价值。
**Ilya Sutskever:** Maybe the datasets are not as non-overlapping as it seems. But there's some sense in which even if an individual human might be less productive than the future AI, maybe there's something to the fact that human teams have more diversity than teams of AIs might have.
**Dwarkesh Patel:** 我们如何在 AI 之间引出有意义的多样性?我认为仅仅提高 temperature 只会导致胡言乱语。你想要的更像是不同的科学家有不同的偏见或不同的想法。你如何在 AI agent 之间获得那种多样性?
**Dwarkesh Patel:** How do we elicit meaningful diversity among AIs? I think just raising the temperature just results in gibberish. You want something more like different scientists have different prejudices or different ideas. How do you get that kind of diversity among AI agents?
**Ilya Sutskever:** 没有多样性的原因,我相信,是因为 pre-training。所有的 pre-trained 模型几乎是一样的,因为它们在相同的数据上 pre-train。现在 RL 和 post-training 是一些差异化开始出现的地方,因为不同的人想出了不同的 RL 训练。
**Ilya Sutskever:** So the reason there has been no diversity, I believe, is because of pre-training. All the pre-trained models are pretty much the same because they pre-train on the same data. Now RL and post-training is where some differentiation starts to emerge because different people come up with different RL training.
**Dwarkesh Patel:** 我听到你过去暗示过 self-play 作为一种方式,要么获取数据,要么将 agent 与同等智能的其他 agent 配对以启动学习。我们应该如何思考为什么没有关于这种东西在 LLM 上起作用的公开提案?
**Dwarkesh Patel:** I've heard you hint in the past about self-play as a way to either get data or match agents to other agents of equivalent intelligence to kick off learning. How should we think about why there are no public proposals of this kind of thing working with LLMs?
**Ilya Sutskever:** 我会说两件事。我认为 self-play 有趣的原因是,它提供了一种仅使用计算、不需要数据就能创建模型的方式。如果你认为数据是最终的瓶颈,那么仅使用计算就非常有趣。所以那就是使它有趣的东西。
问题是,self-play——至少以过去的方式来做——当你有某种互相竞争的 agent 时——它只擅长发展某一类技能。它太窄了。它只擅长谈判、冲突、某些社交技能、策略制定、那种东西。如果你关心那些技能,那么 self-play 将是有用的。
实际上,我认为 self-play 确实找到了一个家,只是以不同的形式。所以像 debate、prover-verifier、某种 LLM-as-a-Judge 也被激励去找出你工作中的错误。你可以说这不完全是 self-play,但这是一种相关的对抗性设置,人们正在这样做,我相信。
真正的 self-play 是 agent 之间更一般的竞争的特殊情况。竞争的自然反应是试图变得不同。所以如果你把多个 agent 放在一起,告诉他们,"你们都需要解决某个问题,你是一个 agent,你在查看其他所有人在做什么,"他们会说,"嗯,如果他们已经在采取这种方法,我不确定我是否应该追求它。我应该追求一些差异化的东西。"
所以我认为类似这样的东西也能创造一种方法多样性的激励。
**Ilya Sutskever:** I would say there are two things to say. The reason why I thought self-play was interesting is because it offered a way to create models using compute only, without data. If you think that data is the ultimate bottleneck, then using compute only is very interesting. So that's what makes it interesting. The thing is that self-play, at least the way it was done in the past—when you have agents which somehow compete with each other—it's only good for developing a certain set of skills. It is too narrow. It's only good for negotiation, conflict, certain social skills, strategizing, that kind of stuff. If you care about those skills, then self-play will be useful. Actually, I think that self-play did find a home, but just in a different form. So things like debate, prover-verifier, you have some kind of an LLM-as-a-Judge which is also incentivized to find mistakes in your work. You could say this is not exactly self-play, but this is a related adversarial setup that people are doing, I believe. Really self-play is a special case of more general competition between agents. The natural response to competition is to try to be different. So if you were to put multiple agents together and you tell them, "You all need to work on some problem and you are an agent and you're inspecting what everyone else is working," they're going to say, "Well, if they're already taking this approach, it's not clear I should pursue it. I should pursue something differentiated." So I think something like this could also create an incentive for a diversity of approaches.
**Dwarkesh Patel:** 最后一个问题:什么是研究品味(research taste)?你显然是世界上被认为在 AI 研究中拥有最好品味的人。你是深度学习历史上最重大事件的共同作者,从 AlexNet 到 GPT-3 等等。那是什么,你如何描述你是如何想出这些想法的?
**Dwarkesh Patel:** Final question: What is research taste? You're obviously the person in the world who is considered to have the best taste in doing research in AI. You were the co-author on the biggest things that have happened in the history of deep learning, from AlexNet to GPT-3 to so on. What is it, how do you characterize how you come up with these ideas?
**Ilya Sutskever:** 我可以就我自己来评论这一点。我认为不同的人做法不同。
有一件事在个人层面上指引我的,是一种关于 AI 应该是什么样的美学——通过思考人是怎样的,但要正确地思考。很容易错误地思考人是怎样的,但正确地思考人意味着什么?
我给你一些例子。人工神经元的想法直接受大脑启发,而且它是一个伟大的想法。为什么?因为你说大脑有所有这些不同的器官,它有褶皱,但褶皱可能不重要。为什么我们认为神经元重要?因为它们数量很多。这感觉是对的,所以你要神经元。你要某种局部学习规则来改变神经元之间的连接。大脑这样做感觉是合理的。
分布式表示的想法。大脑对经验做出反应,因此我们的神经网络应该从经验中学习。大脑从经验中学习,神经网络应该从经验中学习。
你在某种程度上问自己,某些东西是根本性的还是非根本性的?事情应该是怎样的。我认为那一直在很大程度上指引着我——从多个角度思考,寻找近乎美的东西,美和简洁。丑陋没有容身之地。是美、简洁、优雅、正确的大脑启发。所有这些东西需要同时存在。它们越是同时存在,你就越能对一个自上而下的信念有信心。
自上而下的信念是在实验与你矛盾时支撑你的东西。因为如果你总是相信数据,有时候你可能在做正确的事情但有一个 bug。但你不知道有 bug。你怎么判断有 bug?你怎么知道你应该继续调试还是应该得出结论说这是错误的方向?是自上而下的信念。
你可以说事情必须是这样的。类似这样的东西必须有效,因此我们得继续。那就是自上而下的信念,它基于这种多面的美和大脑的启发。
**Ilya Sutskever:** I can comment on this for myself. I think different people do it differently. One thing that guides me personally is an aesthetic of how AI should be, by thinking about how people are, but thinking correctly. It's very easy to think about how people are incorrectly, but what does it mean to think about people correctly? I'll give you some examples. The idea of the artificial neuron is directly inspired by the brain, and it's a great idea. Why? Because you say the brain has all these different organs, it has the folds, but the folds probably don't matter. Why do we think that the neurons matter? Because there are many of them. It kind of feels right, so you want the neuron. You want some local learning rule that will change the connections between the neurons. It feels plausible that the brain does it. The idea of the distributed representation. The idea that the brain responds to experience therefore our neural net should learn from experience. The brain learns from experience, the neural net should learn from experience. You kind of ask yourself, is something fundamental or not fundamental? How things should be. I think that's been guiding me a fair bit, thinking from multiple angles and looking for almost beauty, beauty and simplicity. Ugliness, there's no room for ugliness. It's beauty, simplicity, elegance, correct inspiration from the brain. All of those things need to be present at the same time. The more they are present, the more confident you can be in a top-down belief. The top-down belief is the thing that sustains you when the experiments contradict you. Because if you trust the data all the time, well sometimes you can be doing the correct thing but there's a bug. But you don't know that there is a bug. How can you tell that there is a bug? How do you know if you should keep debugging or you conclude it's the wrong direction? It's the top-down. You can say things have to be this way. Something like this has to work, therefore we've got to keep going. That's the top-down, and it's based on this multifaceted beauty and inspiration by the brain.
**Dwarkesh Patel:** 好吧,我们就到这里。非常感谢你。
**Dwarkesh Patel:** Alright, we'll leave it there. Thank you so much.
**Ilya Sutskever:** Ilya,非常感谢你。
**Ilya Sutskever:** Ilya, thank you so much.
**Dwarkesh Patel:** 好的。
**Dwarkesh Patel:** Alright.
**Ilya Sutskever:** 感谢。那太棒了。
**Ilya Sutskever:** Appreciate it. That was great.
**Dwarkesh Patel:** 是的,我很享受。
**Dwarkesh Patel:** Yeah, I enjoyed it.
**Ilya Sutskever:** 是的,我也是。
**Ilya Sutskever:** Yes, me too.