**Mike Abbott:** 我们看看效果吧。
**Mike Abbott:** We'll see.
**Anjney Midha:** 挺酷的。
**Anjney Midha:** That's cool.
**Mike Abbott:** 我知道。看起来很专业。(笑)我刚才还在想 Rosie,四年前的时候。
**Mike Abbott:** I know. It's pretty nice. It's so professional looking. [laughter] I was just laughing though Rosie like four years ago.
**Anjney Midha:** 是啊。
**Anjney Midha:** Yeah.
**Mike Abbott:** 全靠编嘛。
**Mike Abbott:** Making [...] up.
**Anjney Midha:** 好了,我们已经在直播了。
**Anjney Midha:** So, we're live. Okay.
**Anjney Midha:** 好的。好的。欢迎来到 CS153 Office Hours。
**Anjney Midha:** Okay. Great. Welcome to the CS153 office hours.
**Anjney Midha:** 这是我们第一次直播 office hours,这算是一个实验。我们收到了很多需求,大家想要更多时间跟我们以及一些嘉宾交流。接下来几周我们会尝试把世界各地的人拉进来,但今天就只有 Mike 和我。有意思的是,这门课我们已经教了四年了,这还是第一次正式做 office hours——真正意义上的 office hours。
**Anjney Midha:** Office hours is our first live stream um that we're going to This is an experiment. I think we got a lot of demand about wanting more time with us and some of the speakers. We'll try to actually bring in people from around the world. I think over the year over the next few weeks, but today it's just Mike and me. And so, you know what's interesting is that this is, you know, this is the fourth year we're doing the class. This is the first time we've done office hours. Like really office hours.
**Mike Abbott:** 我平时也做 office hours 的。
**Mike Abbott:** I do office hours.
**Anjney Midha:** 我会留出时间给学生,但不像这种正式的。
**Anjney Midha:** I would make time I would make time for students, but but like official
**Mike Abbott:** 对,对。去年我搞了两场,每场三小时,学生们一个接一个来。
**Mike Abbott:** Yes. Yes. I there were two sessions I did last year that were like three-hour backtobacks where students came in.
**Anjney Midha:** 好吧。
**Anjney Midha:** Okay.
**Mike Abbott:** 我们做了分诊安排——
**Mike Abbott:** And we we kind of triaged somehow I got out of it
**Anjney Midha:** 不知道你怎么逃过去的。二十分钟。我也不知道你是怎么逃过去的。(笑)
**Anjney Midha:** 20 minutes. I don't know how you got out of that. [laughter]
**Mike Abbott:** 我确实做了。但那个价值挺高的。
**Mike Abbott:** I definitely did that. But I I it seemed like that was high value.
**Anjney Midha:** 当然了。
**Anjney Midha:** Oh yeah, for sure.
**Mike Abbott:** 不过我们进入正题吧——我会跟同学们说,但大家随时可以联系 Andreessen Horowitz,无论是专业问题还是创业问题。我大概每周会跟四五个之前上过这门课的学生聊天。
**Mike Abbott:** But let's actually Well, I I'll tell the classes, but like you know any of you should always feel free to reach out to Andreai if you have questions um professional questions if you're starting a company. Like I I probably talk to I don't know probably four or five stu ex students a week actually from this class.
**Anjney Midha:** 好吧,所以你比我——我更擅长批量处理。
**Anjney Midha:** Okay. So you're you're much I I'm better at batch.
**Mike Abbott:** 你更擅长管理时间。
**Mike Abbott:** You're better Well, you're better at managing your time.
**Anjney Midha:** 不是,我就是更擅长批量处理。
**Anjney Midha:** No, that's not true. I'm just better at batch than I'm
**Mike Abbott:** 好吧,这样更高效。不过各有各的方式。我们会尽量把所有问题都过一遍。但我得先道个歉,你们这周发了很多邮件过来。我收到了大概六百封邮件,还在慢慢看。课程团队和我们会想办法分诊处理。很多问题其实是重复的,因为有些——很多人想知道能不能从候补名单上下来,这个显然没法预测。
**Mike Abbott:** Well, this is more efficient. Yeah. I mean, no, no, to each their own. But we will try to get through all the questions as I can. But I will apologize. Many of you did email in this week. I I think we got I got like 600 emails and we're I'm still digging through them. But the course team and us, we will figure out how to triage things. And many of them actually repetitive because some of them are quite well there's a lot of questions with people wanting to know if they're going to get off the weight list and obviously there's no way to predict that.
**Anjney Midha:** 对,我觉得这取决于退课情况。
**Anjney Midha:** Yeah. So I think that one is just depends on the course drop.
**Mike Abbott:** 对,我们对这个没什么控制力。
**Mike Abbott:** Yeah, we don't really have a lot of control with that.
**Anjney Midha:** 基本上取决于已经选上课的同学有多少人在下周退课。所以大家只能等着看了。
**Anjney Midha:** It depends on basically how many people who are already enrolled drop drop it next week. So you just got to wait and watch, guys.
**Anjney Midha:** 好。我们有什么问题?好的,来吧 Michael。哦,这有个问题——在你的讲座中你展示了 scaling laws 在 pre-training、mid-training、post-training 中都成立。有没有证据表明它们开始失效?还是说上限就是算力?这是个好问题。Scaling laws 对哪个领域?这很重要。我认为在编程方面,我们看到它们是成立的,但人们对 scaling laws 的含义有误解。它不是说你多投一个单位的算力就能多获得一个单位的能力,不是这样运作的。它的意思是,如果你把算力和合适数量的数据结合在一起,就可以在某个分布上持续、可预测地提升能力。
**Anjney Midha:** Um All right. Do we have questions that people have? Yes. All right. Go for it, Michael. Oh, here's here's a question. Um, in your lecture on you showed scaling laws hold across pre-training, mid-training, post- training. Is there any evidence they start to break down or is the ceiling just compute? Um, so yeah, this is a great question. Scaling laws for which domain? This is important, right? I think for coding, what we have seen is they're holding, but people have misunderstand what the scaling laws mean. It doesn't mean that just because you put one uh unit of more compute in that you get one unit of capability out. That's not how it works. It just means that you can keep predictably improving um capabilities on some distribution if you combine compute with the right amount of data.
**Mike Abbott:** 对。
**Mike Abbott:** Yep.
**Anjney Midha:** 而且是在一个可验证的领域。
**Anjney Midha:** And in a verifiable domain.
**Anjney Midha:** 在一个可验证的领域。所以在编程领域这一点是成立的。在材料科学方面——我们今天还在 Periodic 坐着呢——也是成立的。在视觉智能方面,就像我们跟 Andy 讨论的那样,也是成立的。所以总体来说,能力确实随着算力可预测地扩展。
**Anjney Midha:** In a verifiable domain. And so in coding that's holding. In material science we were sitting at periodic today that's holding. Um in image vis in visual intelligence like we've had with Andy that's holding. And so yeah, I think I think broadly speaking capabilities scale predict predictably with compute.
**Mike Abbott:** 不过有趣的是,我参与了 Open Athena——一个做科学领域 AI 研究的非营利组织。当我们看一些海洋学之类的项目时,在某些科学领域其实不太清楚算力是否真的与正在构建的模型直接相关。
**Mike Abbott:** It's interesting though because like so I'm involved with open Athena which is a nonprofit doing kind of AR for science and when we look at like some of the projects that we're doing like an oceanography and whatnot the it's unclear actually in some of the scientific domains like does compute like is the compute really kind of related to the models that are being created there.
**Anjney Midha:** 对。这就是关键——没有什么能替代找到正确的架构和算法来高效学习。
**Anjney Midha:** Yeah. So this is the key thing you have to have. There's no substitute for finding the right architecture and algorithm that learns efficiently.
**Mike Abbott:** 是的。
**Mike Abbott:** Yeah.
**Anjney Midha:** 在你取得那个突破之前,你不想过早地扩展算力,否则就是在烧钱。
**Anjney Midha:** And until that you have that breakthrough then you don't want to prematurely scale comput because then you're just throwing computing money.
**Mike Abbott:** 是的,在烧钱。
**Mike Abbott:** You're burning money.
**Anjney Midha:** 是的。是的。是的。
**Anjney Midha:** Yeah. Yeah. Yeah.
**Anjney Midha:** 但如果你找到了正确的架构,在你的消融实验和研究中你找到了正确的数据表征方式,那之后"越大越好"的规律就会成立。
**Anjney Midha:** But if you find the right architecture and in your ablations in your research you figure out the right um representation for the data then once that's happened I find the better lesson holds.
**Mike Abbott:** 我确实认为 transformer 模型——或者它的各种变体,就像我们昨天从 Andy 那里听到的——证明了它比人们想象的要可扩展得多。
**Mike Abbott:** Mhm. And it it I do think you know the transformer model is proving to be or flavors of it like we heard from Andy yesterday is proving to be much more scalable than people realized.
**Anjney Midha:** 对。
**Anjney Midha:** Yeah.
**Mike Abbott:** 但这不意味着你应该在准备好之前就往问题上砸算力。
**Mike Abbott:** But that doesn't mean that you should start throwing computer out of the problem before you're ready.
**Anjney Midha:** 这就是我所说的算力最优扩展(compute optimal scaling)。
**Anjney Midha:** That that's what I would call comput optimal scaling.
**Mike Abbott:** 对。
**Mike Abbott:** Yeah.
**Anjney Midha:** 所以在海洋学这个例子中,作为一个新前沿,我们可能还处于研究阶段。
**Anjney Midha:** And and so in the oceanography example, which is a new frontier, we're probably in the research phase.
**Mike Abbott:** 没错。
**Mike Abbott:** That's right.
**Anjney Midha:** 因为它大概比前沿领域落后两三年。
**Anjney Midha:** Because it's probably about two or three years behind.
**Mike Abbott:** 对,就是这样。
**Mike Abbott:** There we go. Yeah.
**Anjney Midha:** 我觉得这完全正确。
**Anjney Midha:** I think that's exactly right.
**Mike Abbott:** 最终,只要合适的人才去攻克这个问题,有正确的数据,我完全预期如果能建立起那个可验证性循环,进展应该会很快。然后就是算力的问题了。
**Mike Abbott:** Yeah. So I think in at the end of the day, as long as the right talent attacks that problem and they have the right data, I would totally expect if it can get that verifiability loop going, progress should should happen there pretty fast. And then it's about compute.
**Anjney Midha:** 对,我觉得你说得对。有道理。好,下一个问题。Mid-training 实际上在做什么,跟 fine-tuning 有什么不同?只是在更精心策划的数据上投入更多算力吗?这是个好问题。简短的回答是 mid-training 没有标准定义,但通常可重复的做法是:你拿基础模型做扩展预训练。
**Anjney Midha:** Yeah, I think you're right. That makes tons of sense. Okay, we got another question. Um, what's actually happening during mid-training that's different from fine-tuning? Is it just more compute on more curated data? Uh this is a great question and and the short answer is there's no standard definition of mid training but it is usually the the the repeatable thing is you just take the base model and you do extended pre-training.
**Anjney Midha:** 你不是在做监督微调或强化学习,就是字面意义上的扩展预训练——在预训练数据集中加入更多 token,针对那个基础模型的特定能力。好,考虑到你展示的历史大宗商品周期,你觉得什么时候我们能看到 GPU 价格出现实质性回调?
**Anjney Midha:** It's not really you're not doing supervised fine tuning or reinforcement learning. It's literally just extended pre-training with more tokens in the pre-training data set for for that foundational model for that particular capability. Yeah. Um, given the historical commodity cycles you showed, when do you think we'll see a meaningful GPU price correction?
**Mike Abbott:** 嗯,这是百万美元——不,万亿美元的问题。
**Mike Abbott:** Um, it's the million-dollar question or a trillion dollar question.
**Anjney Midha:** 对,确实。这是个好问题。我觉得有几件事在发生。首先,对于那些无法扩展的集群,GPU 价格存在一个不连续性。如果是小集群,对前沿团队来说其实没那么有价值,因为你希望能够优雅地扩展。所以如果只是一个——说出来挺疯狂的——只是一个 512 芯片的集群,几年前这还算大的。那是每年七八百万美元的集群,但现在对于认真做训练的团队来说,这几乎太小了。你觉得做正经训练的最小规模是多少?集群要多大?
**Anjney Midha:** Yeah. Yeah. So, it's a good question. So, I think there's a couple of things going on. One is there there is a I would say there's a discontinuity in the GPU price for clusters that aren't that don't scale. So if it's a small cluster actually right it's not that valuable for a frontier team to use because you want to be able to scale pretty elegantly and so if it's just a 500 you know it's crazy for me to say this if it's just a 512 chip cluster yeah which a few years ago was big was big that's a $7 million $8 million year cluster that's almost too small now for a team that's doing serious training what do you think that mint size is to do serious training what is the size to that cluster.
**Mike Abbott:** 我认为今天要做有意义的研究、有意义的规模化——如果你看看超大规模厂商等等——
**Mike Abbott:** I think it's very hard to do meaningful research today, meaningful scaling like the the actually if you talk about the hyperscalers and so on,
**Anjney Midha:** 我觉得少于 4K 到 6K 作为基准的话就不太有意思了。
**Anjney Midha:** I think less than 4K 6K as a checkpoint is just not that interesting.
**Mike Abbott:** 对。
**Mike Abbott:** Yeah.
**Anjney Midha:** 所以我认为对于更小的团队和推理来说,那些集群反而更好用。旧的 H100 对推理很有价值,对低于规模的研究也很有价值。
**Anjney Midha:** And so I think for smaller teams and for inference, those clusters are much better. Older H100's super valuable for inference and super valuable for research that's subscale. Mhm.
**Mike Abbott:** 但一旦你有了那个突破,你就需要上到 4K、6K、10K。所以在 AMP 这边,
**Mike Abbott:** But then once you have the soda breakthrough, y then you need to be on 4K, 6K, 10K. So at AMP,
**Anjney Midha:** 我们真正兴奋的是看到一个集群从 4K 起步,但能扩展到 16K、20K。
**Anjney Midha:** I mean, we get really excited when we see a cluster that starts at something like 4K, but can scale to 16K, 20K.
**Mike Abbott:** 嗯。
**Mike Abbott:** Mhm.
**Anjney Midha:** 我觉得这是今天的最优形态。现在我很确定一年后回看今天会觉得,这是什么?到时候会是 8K、10K。我觉得这是趋势方向。有意思。算力标准化在实践中会是什么样子?有没有具体的技术或监管举措能推动这件事?
**Anjney Midha:** Um I that I find is the optimal shape today. Now I I'm pretty sure it's going to happen is a year from today. It'll be like an what is that? And it'll be like 8K, 10K. That's I think that's where the space the trend is. Interesting. um uh what would standardization of compute actually look like in practice? Is there a specific technical or regulatory move that unlocks it?
**Anjney Midha:** 这件事的终极版本——大想法——就是一个通用内核(universal kernel)。
**Anjney Midha:** So the grand unified version of this, right, the big idea would be a universal kernel.
**Mike Abbott:** 对。
**Mike Abbott:** Yep.
**Anjney Midha:** 想象你可以直接跟 Claude 说:"嘿 Claude,这个研究、这个消融实验效果很好。部署。"然后 Claude 或某个 AI 去搞清楚如何在一个大型多租户集群上部署这个东西,完成所有训练,你不需要担心是哪个芯片组,它就是高效的。然后你按下部署,自动扩展。你把所有硬件都抽象掉了——那个通用内核,算力就是算力,你不用担心任何底层设施。这就是梦想。但同时,一些供应商并没有动力去做这个超级内核。我觉得短期内他们可能没有,但如果你看看我们讨论过的工业革命历史——它会收敛。交流电/直流电标准化的主要受益者其实是发电公司,因为标准化让生产力更高、资源消耗更稳定,这意味着更多创新,进而随着时间推移对电力的总体需求增加。所以你必须跳出季度财报的思维模式来看——也许这个季度我们在训练方面丢了一点市场份额,但我们在推理方面获得了更多份额,因为有了更多的可替代性。然后随着时间推移,股价不再是那种大起大落,而是稳步上升。
**Anjney Midha:** Right. Where you just load up a workload. I mean imagine if you could just talk to Claude and say, "Hey Claude, this research, this ablation was really good. Deploy." and cloud goes and figures out or some AI goes and figures out how to deploy this on a large like multi-tenant cluster does all the training does you don't have to worry about which chipset it is it just efficient then then and then you hit deploy autoscaling on its own where you just abstract away all the hardware that universal kernel where it flops or flops or flops you don't have to worry about any of the infrastructure that's the dream yeah yeah that's the dream but at the same time though some of the vendors are not incented for that Uber kernel you know I think in the short term they may not be but if you look at the history of the industrial revolution that we talked about converges the the major beneficiaries actually it turns out of standardization of AC/DC were power generators like power generation firms because it it allows more productivity more stable consumption of the resource which means more innovation which means then overall demand increases over time for electricity so you have to look past a little bit the quarterly earnings kind of grind to say, "Yeah, maybe this quarter we lose a little bit of market share in training, but we're getting more market share in inference because there's more fungibility." And then over time, instead of stock prices doing this and then this, we'll do this.
**Mike Abbott:** 对。
**Mike Abbott:** Mhm.
**Anjney Midha:** 这是每个人想要的,对吧?我们就想要稳定的、可预测的、快速增长的曲线。
**Anjney Midha:** And that's what everybody wants, right? We just want stable, predictable, and like fast scaling growth curves.
**Mike Abbott:** 我觉得问题就在于因为缺乏可替代性、没有标准,我们才不断看到这种波动。
**Mike Abbott:** And I think the problem is because of this lack of funibility and there's no standards that we keep getting this.
**Anjney Midha:** 对,会变得锯齿状。有道理。我们看看下一个问题。如果上下文是一种新的护城河,那一个没有专有数据的创业公司该怎么办?有没有路径可以走进去?你怎么看?
**Anjney Midha:** Yeah, it gets jagged. Yeah, that makes sense. Um let's see next question. If context is a new mode, what should a startup with without proprietary data do? Is there any path in? What do you think?
**Mike Abbott:** 哦,我觉得有非常多的路径。
**Mike Abbott:** Oh, I think there's many many paths in.
**Anjney Midha:** 我觉得一个大机会——我们在课上一直在讨论的——就是新前沿。
**Anjney Midha:** I think one big opportunity which we've been talking about in the class is new frontiers.
**Mike Abbott:** 对,到处都有数据池,包括个人数据池。我觉得这是 Apple 最引人注目的一点——一台最新一代的 MacBook,你可以在上面做一些数据生成,相对于三四年前来说已经很有意义了。
**Mike Abbott:** Yeah, there's so many data pools all over the place including personal data pools. I think what this is one of the most compelling things about Apple, right, is like a a good um a late latest generation MacBook. You can do like some data generation on that that is meaningful relative to 3 four years ago.
**Anjney Midha:** 嗯。比如我一直在做自己的项目,基本上是从我过去四五年的笔记中生成数据。如果你用 LLM 去回溯阅读你的笔记、做标注、结合时间戳——我实际上还把心电图传感器数据放进去了。
**Anjney Midha:** Mhm. Like I' I've been working on my own project which is basically like generating data from my notes for the last four or five years where if you use an LLM to go back and read your notes and annotate like combine timestamps with uh I actually put my heart sensor data in from my ECG.
**Mike Abbott:** 突然间你就有了一个把日历和——
**Mike Abbott:** Now suddenly you've got a representation that combines your daily calendar with your
**Anjney Midha:** 你是放在 MD 文件里吗?
**Anjney Midha:** Are you putting just like an MD file sense?
**Mike Abbott:** 对,就是 MD 文件。不不不,MD 文件是 prompt。
**Mike Abbott:** Yeah, literally MD No, no, no. the the MD files are the prompts.
**Anjney Midha:** 对,但如果你看 Obsidian,你可以存很多这种数据。
**Anjney Midha:** Yeah, but like if you look at Obsidian like you can store a lot of that data.
**Mike Abbott:** 我很多笔记都在 MD 文件里。心电图的东西是 CSV,所以你得做一些数据管道的工作。但现在的编程模型做这些事很擅长。所以我觉得有很多很多前沿领域,你可以说:这是一种新的、以前从未被这样表征过的数据集组合,你可以在上面很快提升智能。所以一个方向就是新前沿、新领域、新任务、新能力——大厂没有聚焦的地方。
**Mike Abbott:** A lot of the my notes are in MD files. Um and then the the ECG stuff actually is in CSVs and so you got to do a bit of a some data plumbing, but you know the coding models are really good at this stuff. And so I I just think there's lots and lots of frontiers where you can say like here's a new form like a a combination of different data sets that have never been represented before in a unique way and you can increase intelligence on that very fast. So one is just new frontiers, new areas, new tasks, new capabilities that the the big guys are not focused on.
**Anjney Midha:** 第二个方向是敏感数据。这也是为什么 Mistral 如此有价值——有大量关键任务级的政府数据、军事数据,他们在欧洲的合作伙伴信任他们来处理。但还有很多其他场景中,企业或客户可能不想把数据分享出去。
**Anjney Midha:** And then the second one is sensitive data. You know, that's where Mistral is so valuable because there's a bunch of mission critical government data, military data, for example, that their partners trust them with in Europe, but there are lots of other places where an enterprise or customer might not want to share that data too.
**Mike Abbott:** 是的,有道理。你说过进展在容易验证的领域最快。你觉得哪些领域接下来会被解锁?我的意思是,你提到了材料科学。
**Mike Abbott:** Yeah, that makes sense. Um, you said progress is fastest in easily verifiable domains. Which domains you think are next to get unlocked? I mean, you mentioned material science.
**Anjney Midha:** 嗯,是的。所以这里肯定有工程学,像物理和化学,推理方面我们发现非常可验证。
**Anjney Midha:** Well, yes. So here definitely engineering like physics and chemistry reasoning we're finding here is very very farable.
**Anjney Midha:** 你知道,材料学是一个非常通用的领域,因为它结合了物理和化学,而且非常可验证。但如果你验证了材料属性,结果是模型在物理和科学方面——特别是物理和化学方面——整体上都变得更好了。所以那确实是一个——我非常看好。
**Anjney Midha:** Um you know the thing is materials is not materials is a very general purpose um domain because it combines physics and chemistry and it's very verifiable but if you if you if you sort of verify that like materials properties it turns out the models get better at physics and science in general physics and chemistry in general in particular. So that that's a really uh I'm very bullish on that.
**Mike Abbott:** 我能问个关于这个的问题吗?你的意思是——这些实验室面临的一个大挑战归结为评估(eval),对吧?
**Mike Abbott:** Can I ask a question on that? Like so you mean one of the big challenges in these labs it comes down to eval right.
**Anjney Midha:** 是的。
**Anjney Midha:** Yeah.
**Mike Abbott:** 所以当你看这些其他领域——比如材料科学——评估是怎么做的?有两三种不同类型的评估。一种就是直接测量——在这个例子里是超导性。
**Mike Abbott:** And so when you look in these other domains like material science like how do how do the eval work um so there's there's there's two or three different types of eval. One is just straight up like in the case here it's super conductivity.
**Anjney Midha:** 所以你可以直接测量材料的电阻。非常可验证。
**Anjney Midha:** So you can just measure resistance of the material. So that's very verifiable.
**Mike Abbott:** 但在工程进展方面,我经常看的评估指标是——一个科学家在没有 Periodic agent 的情况下完成这个任务需要多长时间。
**Mike Abbott:** But in terms of engineering progress, I think the eval I often look for is like what how much time would it have taken a scientist to do this task without the periodic agent.
**Anjney Midha:** 然后你做一个对比评估——有 agent 的情况下实验速率的生产力提升。
**Anjney Midha:** Y and then you do an eval side by side of the productivity gains in the rate of experimentation with the agent.
**Mike Abbott:** 是的,非常有道理。我在想 EVO 2 项目,在 Arc Institute 那边,我知道科学家们在用那个模型来加速实验,就像你说的那样——对,非常有道理。好,所以问题是哪些领域——材料、物理、科学这些是可验证的,因为现实是可验证的。所以 Periodic 在做的那种——从现实中进行验证——我非常看好。其他领域——机器人学也是非常可验证的,你知道,就像你从 Andy 那里听到的,结果那也是物理验证——你确实可以判断一个机器人是否正确完成了某个合成任务。
**Mike Abbott:** Yeah, that makes tons of sense. I mean I'm just thinking of the EVO 2 project. ah yeah over at Arc Institute and I know that scientists are using that model to basically accelerate experiments to your point like that yeah that makes tons of sense um okay and and so but the question was what are the domains so materials physics science here verifiable because reality is verifiable yeah so wherever you know periodic is doing like reality verification from reality I think I'm very bullish on um other areas that are like robotics is quite verifiable you know, as as you're hearing from Andy, um, it turns out like that's also physical verification where you you can actually tell whether a robot synthesized something correctly or not.
**Anjney Midha:** 嗯。
**Anjney Midha:** Mhm.
**Mike Abbott:** 因为那是可测量的。基本上,在整个工业工程领域,有物理的定量指标,非常容易。非常容易。
**Mike Abbott:** Because that's measurable. Basically, throughout the industrial engineering world, physically there's a quantitative metric, but there's a quantitive metric. Very easy. Very easy.
**Anjney Midha:** 你怎么看那些定性领域?
**Anjney Midha:** What do you think about these qualitative domains?
**Mike Abbott:** 创意写作是最难的。
**Mike Abbott:** Creative writing is the hardest one.
**Anjney Midha:** 是的。
**Anjney Midha:** Yeah.
**Mike Abbott:** 太难了。
**Mike Abbott:** So hard.
**Anjney Midha:** 是的。模型在创意写作方面仍然很差,我不确定那是否会进步,但你知道那是一个我很想看到学生们去更多攻克的领域,因为我觉得如果你正确地策展数据,有足够的算力,也许我们应该能得到有品味的写作模型。我们还没有,但我很想资助一个这方面的项目。那将会是一个非常精选的数据集。而且你不需要那么多数据来启动后训练的飞轮。
**Anjney Midha:** Yeah. their models are still so bad at creative writing and I I don't see that maybe that'll progress but um you know that's one domain I'd love to see students actually attack more because I think if you create curate the data correctly you have enough compute maybe maybe we should be able to get good tasteful writing models we don't have them yet but I would love to fund a project around that um that's going to become like you're saying like a very curated data set that Yes, you and you don't need that much data to bootstrap some of the post- training flywheels.
**Mike Abbott:** 你需要多少数据?
**Mike Abbott:** How much data do you need?
**Anjney Midha:** 用 DeepSeek 的例子来说,就是几千个精心制作的样本。所以几千个样本。你知道,有一件事——我记得大概一年前跟一个在 OpenAI 的朋友聊,他想获取 Andreessen Horowitz 过去十年所有投资备忘录的访问权限,他说,"嘿,如果我们能拿到那些,那将是最丰富的手工策展、手工精制的投资备忘录数据集之一。"
**Anjney Midha:** I mean with Deep Seek it was a few thousand samples that were just really wellcraftrafted. So few thousand samples. You know, one of the things I I remember I was talking to um a friend who was at OpenAI about a year ago who was trying to get access to all the memos at Enricoritz like the investment memos over 10 years and said, "Hey, if we could get access to that, it would be like one of the richest data sets of like handcurated, handcrafted investment memos."
**Mike Abbott:** 是的。
**Mike Abbott:** Yeah.
**Anjney Midha:** 理论上你应该能创建一个 fine-tuning 后的模型,非常擅长写那种东西。
**Anjney Midha:** Like theoretically, you should be able to create a fine tune of a model that's really good at writing like that.
**Mike Abbott:** 是的。是的。是的。
**Mike Abbott:** Yeah. Yeah. Yeah.
**Anjney Midha:** 所以我对小规模数据集非常看好——就像那个例子——不过它几乎是半结构化的。
**Anjney Midha:** And so I I I that one I I'm very bullish on like small like that's where like that example though it's like almost semistructured.
**Mike Abbott:** 是半结构化的。是的。
**Mike Abbott:** It's semi-structured. Yeah.
**Anjney Midha:** 所以这给了一点优势。
**Anjney Midha:** So that has gives a little bit of an advantage.
**Mike Abbott:** 是的。是的。
**Mike Abbott:** Yes. Yeah.
**Anjney Midha:** 下一个问题。DeepSeek 声称用很少的成本训练了一个前沿模型。这是否打破了你展示的"算力等于收入"的关联?还是只是同一条曲线上的不同点?
**Anjney Midha:** Yeah. Um next question. Uh DeepC claimed to train a frontier model for a fraction of the cost. Does that break the compute equal revenue correlation you showed or is it just a different point on the same curve?
**Mike Abbott:** 不,我不太确定我理解这个问题。你能再读一遍吗?
**Mike Abbott:** No, I'm not sure I understand the question. Could you read?
**Anjney Midha:** 好,我再读一遍。DeepSeek 声称用很少的成本训练了一个前沿模型。
**Anjney Midha:** Yeah, I'll read it again. Deepc claimed to train a frontier model for a fraction of the cost.
**Mike Abbott:** 是的。
**Mike Abbott:** Yeah.
**Anjney Midha:** 这是否打破了算力与收入的关联?
**Anjney Midha:** Does that break the compute people revenue?
**Mike Abbott:** 我明白了。不不不。不会。看,第一次做某件事总是很贵的。第二次或第三次做同样的事情在任何领域都更便宜。
**Mike Abbott:** I see. I see. No, no, no. It does. So, look, doing it, doing things the first time is very expensive. Doing things a second or third time is always cheap in any domain.
**Anjney Midha:** DeepSeek 做的事情之一是——他们不是第一个。他们不是——我不觉得他们是第二个推理模型——是第三个。
**Anjney Midha:** And one of the things Deepseek did was it they were not the first. They were not I don't think there was a second reasoning model is the third.
**Mike Abbott:** 是的。所以我不觉得那——我就称之为算力最优扩展——你知道,一旦你知道怎么做,第二次或第三次总是更容易更便宜。
**Mike Abbott:** Yeah. And so uh I don't think that like that's I just call that comput optimal scaling which is you know when once you know how to do it the second or third time around is always easier and cheaper.
**Student:** 有道理。我们中有些人资源受限——是 GPU 穷人——怎么能在 AI 技术栈中做贡献?
**Student:** That makes sense. Some of us are resource constrained GPU poor how can anyone contribute in the AI stack?
**Anjney Midha:** 对于课上的学生来说,我们这个学期会提供算力给他们,我觉得还会有一些捐赠,来自各种合作伙伴。
**Anjney Midha:** Um well one for for students who are in the class we're going to be making compute available to them this quarter and I think there's going to be some donations as well from various partners. Yeah.
**Mike Abbott:** 课程的。所以如果你是学生,你会没问题的。
**Mike Abbott:** And the class. So if you're a student, you'll be fine.
**Anjney Midha:** 你们应该填那个表格——我记得截止日期是昨天。
**Anjney Midha:** They should you should fill out the form that I think it was due yesterday.
**Mike Abbott:** 昨天截止了。所以我们会评估需要多少算力,你知道,我们会有一些来自 AMP 池的。但如果你没有访问权限,你就得去做更基础性质的研究,对吧?你得做算法研究,这也是很多学者在做的事情,因为是的,他们现在是 GPU 穷人了。
**Mike Abbott:** It was due yesterday. So then we're going to be sizing up how much comput, you know, we'll have some from the AMP pool. Um but if you don't have access to it, you just have to then do research that is more fundamental in nature, right? You have to do algorithms research which is what a lot of academics are doing because yeah, they are GPUs now.
**Anjney Midha:** 是的。但没错。我觉得那是唯一的——你得有创意,得有创新,得做基础的算法设计——那方面有太多可以做的。
**Anjney Midha:** Yeah. But that's right. I think that's the only you have to get creative, you have to get innovative, you have to do fundamental algorithms design and there's so much to be done there.
**Mike Abbott:** 我的意思是说到底 transformer 相对于我看到的一些新架构来说相当低效。我觉得接下来——还有一些方法——我在想,回到 Open Athena——我们参与了 Stanford 的 Marin 项目,Percy Liang 有一个完全开源的 LLM,那是完全开放的——任何人都可以贡献,即使你不在 Stanford。
**Mike Abbott:** I mean at the end of the day transformer is pretty inefficient relative to some of the new architectures I'm seeing. I I think over the next well there's also ways I'm just thinking so again back to open Athena we're involved with the Marin project at Stanford Percy Lang has an open source LLM that's like all open and that's open like anyone can actually contribute to that um even though even if you're not at Stanford.
**Anjney Midha:** 对。对。所以使用开源模型是一个巨大的解锁,对吧?因为那样你就可以搭别人投入的大笔资金的便车。
**Anjney Midha:** Yeah. Yeah. So using open model is a huge unlock, right? Because then you get to piggy back off of a bunch of money somebody else has put in.
**Anjney Midha:** 我觉得嗯,今年你会看到至少三四个实验室放出更多的 pre-train 模型,嗯,你知道,那是一个起点。所以 Gemma 刚出来了。
**Anjney Midha:** I think um you'll see more and more pre-trains from at least three or four labs this year which um you know which is a starting point. So Gem Gemma just came out.
**Mike Abbott:** 嗯,那是一个相当强的模型。
**Mike Abbott:** Um that's a quite strong model.
**Anjney Midha:** 这是一个相当强的模型,因为是从主要的 Gemini 系列蒸馏出来的。
**Anjney Midha:** It's a pretty strong model because distilled from the main you know Gemini family.
**Anjney Midha:** 嗯,我觉得中国的模型也不差。我觉得——我希望 Neimotron 项目呃会继续变好。Mistral 有一堆非常好的开源基础模型。Black Forest 也有——所以我觉得目前来看,今年我感觉还不错——实际上相当有信心,基础模型是有的。开源基础模型是一个好的实验皿。
**Anjney Midha:** Um I think that the Chinese models aren't bad. I think that I hopefully the Neimotron project uh will will keep getting better. Mistral has a bunch of really good open base models. Black Forest has so I I I think for now for this year I'm feeling decently pretty actually pretty confident that base models are are there. Open base models are a good petri dish.
**Mike Abbott:** 所以回到这个开源的 Marin 项目,它的一个特点是所有训练数据都是开放的。对。对于其他一些模型,你只能拿到权重——你拿到权重,然后你可以用自己的数据做扩展的中期训练或 post-training,用你自己策划的数据做 RL,这就是课程最终要收敛到的——这是一个好的基础模型,用于你的任务,这是你如何策划数据集,这是最后一公里。非常有道理。嗯好的,这里有一堆问题。你觉得这些垂直 AI 初创公司——每个都有自己的工具和界面——是 AGI 导向的,还是我们会看到一个更通用的 agent 界面,能够处理企业中的各种任务?
**Mike Abbott:** So one of the things coming back to this marine project that's open is like all the data that it's trained on. Yeah. For some of these other models where you just get let's say the weights. you get the weights and then you can do your own extended mid training or post training with your own data with your own curated data and do RL on your own which is what the class is going to converge on is use here's a good base model for your task here's how you create curate a data set and here's the last mile makes tons of sense um okay bunch of questions here do you think these vertical AI startups each with their own tools and interfaces are AGI pilled or will we see a more general purpose agent interface that can handle tasks across the enterprise.
**Anjney Midha:** 我觉得我们会看到的是越来越多的企业不想被锁定在单一模型上。
**Anjney Midha:** I I think that what we're going to see is more and more enterprises not want to be locked into a single model.
**Mike Abbott:** 嗯哼。
**Mike Abbott:** Mhm.
**Anjney Midha:** 所以我认为企业选择信任、选择安心。他们选择的技术可能像云一样——对吧,大公司像 AWS、GCP、Azure 什么的——很可能会走类似的路径。每个主要类别都会是多模型的,我认为企业会选择他们信任的合作伙伴——比如这是一个网络安全合作伙伴,这是一个——嗯,我们已经开始看到这种情况了,对吧,Mistral——ASML 选择了他们作为主要的 AI 合作伙伴,然后整个公司有一大堆用例和任务等等,都是 Mistral 负责的——但这不是一个点状解决方案,而是一个伞式合作关系。我觉得很多——我们已经并且正在进入一个系统采购的世界,不是模型采购,因为对企业来说,试图从这个人那里采购一个模型实在太复杂了。
**Anjney Midha:** And so I think enterprises choose trust, peace of mind. They choose the technologies probably like cloud like right large companies like AWS, GCP, Azure whatever and it's likely going to be a similar path. It will be multimodel in each major category and I think enterprises will choose partners they trust like here's a cyber security partner here's a um like you we're already starting to see that right with Mistral ASML chose them as their primary AI partner and there's like a whole bunch of use cases and tasks and so on across the entire company that Mistral is responsible for but it's not a point solution it's an umbrella partnership and I think that's how a lot of like the the we are work We we have been and are entering a world of systems procurement, not model procurement, because it's just too complicated for enterprises to try and procure one model from this person.
**Mike Abbott:** 他们没有那个能力。
**Mike Abbott:** They don't have the sophistication.
**Anjney Midha:** 没有。而且很痛苦。还有把它们全都用胶带粘在一起的安全问题。
**Anjney Midha:** No. And and it's painful. And the security of duct taping them all together.
**Mike Abbott:** 对。
**Mike Abbott:** Yeah.
**Anjney Midha:** 这简直是噩梦。所以我至少在经验上观察到的——至少在我参与的董事会上——企业的 CISO、CTO,甚至 CEO,那些不在 AI 行业但想把 AI 前沿系统的好处带到自己业务中的人,正在寻找少数几个可信赖的合作伙伴,他们是 AI 原生的——这真的是这样——比如你知道我还在给 Mary Barra 做顾问——同样的思路,我觉得你说得对,大多数科技行业之外的 Fortune 1000 强公司都在走这条路。没错。因为模型是一种基础设施,而你不想——就像任何好的基础设施一样,任何一个部分——你需要冗余。
**Anjney Midha:** It's a nightmare. And so what I'm at least observing empirically at least on the boards I'm on is enterprise CISOs, CTOs, even CEOs just who are not in the AI business but want to bring the benefits of AI frontier systems to their business are looking for a few a handful of trusted partners who are AI native who it's really true like I you as you know I still advise Mary Bars same approach I mean which I think you're right I think most large Fortune 1000 companies outside of tech are going down that path. That's right. Because models are a type of infrastructure and you don't want to be like with any good infrastructure any piece of you need redundancy.
**Anjney Midha:** 对。
**Anjney Midha:** Yeah.
**Mike Abbott:** 所以无论你和谁合作,都需要在不同的模型提供商之间有冗余。对。
**Mike Abbott:** So whoever you're going to work with needs to have redundancy across different model providers. Yeah.
**Student:** 好的。下一个问题。Claude Code 一天做 214K 个 GitHub commit,太疯狂了。在什么时候 agentic 计算消耗会矮化训练计算?在什么时候 agentic——
**Student:** Okay. Next question. Cloud code doing 214k GitHub commits a day is wild. At what point does agentic compute consumption dwarf training compute? At what point is agentic
**Anjney Midha:** 我不确定我——我不确定区别是什么,但呃 agentic 计算——好吧,我不知道 agentic compute 在技术上是什么意思,对,我也不知道。但如果你——
**Anjney Midha:** I'm not sure if I and I'm not sure what the difference is but uh aentic compute okay I don't know what agentic compute technically is yeah neither do I but if you're
**Mike Abbott:** 我觉得 agentic 这个术语太痛苦了——被过度使用了。确实是。嗯,但如果大方向上这个问题是:推理计算什么时候会——
**Mike Abbott:** I think that term agentic is like painful it's like so overloaded it really is um but if the general directionally if the question is when will inference compute
**Anjney Midha:** 我觉得那可能已经超过训练了。已经超过了。但这是另一个问题——训练和推理之间的区别实际上也没那么清晰,因为你为 RL 做了大量推理——就像你做 post-training 的方式——是你做 RL rollouts——就是你在 CPU 上、有时在 GPU 上做这些 rollout——那是推理,因为你在模拟任务的执行,然后你把 rollout 链拿回来,再输入到你的训练运行中——推理是训练预算的一部分。对,
**Anjney Midha:** I think that's probably overtake training it already has. But this is the other problem like the distinction between training and inference is actually not that clear either because you do so much inference for RL like the way you make the post training y happen is you you you do RL rollouts like you you you have on CPUs and sometimes GPUs you do these rollouts that's inference because you're simulating the task happening and then you take the rollout chains and then you pipe that back into your training run inference is part of the training budget. Yeah,
**Mike Abbott:** 我就把它全看作一个大的计算预算,需要灵活分配到各个集群上。它是可替代的,我们正朝那个方向走。训练 vs 推理、RL vs 非 RL、GPU vs CPU——这些区别都是某个时间点上的分类。用来帮我们理解这个行业是没问题的,但我们的方向是——至少那些工作负载最大的团队——只想要一个大的灵活计算池,他们可以保留基础份额并执行。对,很有道理。
**Mike Abbott:** I just see it all as one big comput budget that needs to be flexibly allocated across clusters. It's funible and that's where we're going. This training versus inference, RL versus not, GPUs versus CPUs, these these distinctions were all like point in time categorizations. It's fine to help us make sense of the industry, but where we're headed is at least the teams with the largest workloads just want one big flexible pool of compute that they can they have their preserve bas and execute. Yep, makes sense of sense.
**Student:** 嗯,Anthropic 刚达到 140 亿美元年化收入。在什么时候收入飞轮会使新实验室在结构上无法竞争?
**Student:** Um, Anthroic Anthropic just hit a $14 billion run rate. At what point does the revenue flywheel make it structurally impossible for new labs to compete?
**Anjney Midha:** 它说的——我以为他们宣布的是 300 亿。
**Anjney Midha:** Did it say I thought they announced they had 30.
**Mike Abbott:** 对,我以为他们宣布的也是 300 亿,那就更大了。但我觉得关键是——当增长如此之快,而且他们的收入在加速——对——这对新进入者意味着什么?
**Mike Abbott:** Yeah, I thought they announced 30, too, which is even bigger. So but I think the point is is like when it grows so much f faster and their revenue is accelerating right what does that mean for new entrance
**Anjney Midha:** 我觉得现在进入编程领域非常困难了。像 Anthropic、OpenAI、Gemini 在编程方面——你知道在像任务关键的编程数据集方面——比如 Mistral 在欧洲的政府场景中已经成为编程模型的一个提供商,等等。所以我觉得编程战争的格局——
**Anjney Midha:** I think it was very hard to enter into coding now like anthropic openai gemini encoding you know in the case of s like mission critical coding data sets like mistrol has emerged as a one provider of coding models in government context and so on in Europe and so I think that's the shape of the coding the coding wars I think
**Mike Abbott:** 我觉得那班火车已经到站了。对,我觉得你说得对。对。嗯,下一个问题。harness engineering 是否正在变得比 LLM 本身更重要?有些人说 LLM 正在商品化。
**Mike Abbott:** I think that train is to stationation. Yeah, I think you're right. Yeah. Um, next question. Is harness engineering becoming more important than LLMs given some people say LLMs are becoming commoditized?
**Anjney Midha:** 我不知道……我不太确定我完全理解这个问题。Harness engineering 是否比 LLM 更重要?我的意思是,你是在 LLM 外面套一个 harness。
**Anjney Midha:** I don't know if I don't know if I quite understand the question. Is harness engineering becoming more important than LLMs? I mean, you're putting the harness around an LLM.
**Mike Abbott:** 对。对。这些不是对称的比较。
**Mike Abbott:** Yeah. Yeah. These are not as these are not symmetric comparisons.
**Anjney Midha:** Harness 是让一个系统协同工作的东西,而 LLM 是系统中的一个单独组件。
**Anjney Midha:** The harness is the thing that allows a system to work together and an LLM is an individual component of the system.
**Mike Abbott:** 所以简短的回答是——这不是同类比较。
**Mike Abbott:** So the short answer is not apples and oranges.
**Anjney Midha:** 如果你想重新组织一下这个问题再问,请便。
**Anjney Midha:** If you want to try reasking that question, go for it.
**Student:** 好的。
**Student:** Yeah.
**Mike Abbott:** 再清楚一点。[哼了一声] 嗯,你们怎么看 Claude Code 泄露事件?我很惊讶有那么多 feature flag 是关闭的。太厉害了。
**Mike Abbott:** With a little more clarity. [snorts] Um, what do you guys think of the cloud code leak? I was surprised at how many uh feature flags were turned off. Amazing.
**Anjney Midha:** 好的。我实际上还没来得及看,所以我没有什么意见。
**Anjney Midha:** Okay. I actually I haven't had a chance to look at it, so I don't have an opinion on it.
**Mike Abbott:** 嗯。我不确定我有什么特别强的观点。我没有仔细看过,但是 feature flag 被关掉了。
**Mike Abbott:** Yeah. Um I don't know if I have a really strong opinion. I didn't go through it in detail, but feature flags were turned.
**Anjney Midha:** 很明显这家公司就是在猛干。我的意思是
**Anjney Midha:** Clearly the company just like kicking ass. I mean
**Mike Abbott:** 对。对。他们是很好的工程师。
**Mike Abbott:** Yeah. Yeah. They're good engineers.
**Anjney Midha:** 嗯,对。我们看看 Merkore 那件事会怎么样
**Anjney Midha:** Um Yeah. We'll see what happens with Merkore on that on that same
**Mike Abbott:** 那可真是很艰难。
**Mike Abbott:** That was tough.
**Anjney Midha:** 对。因为一旦出了安全事件,然后数据被泄露,企业就会开始说,我能信任你吗?
**Anjney Midha:** Yeah. because you have one of those security incidents and then data gets leaked, enterprises start going, can I trust you?
**Mike Abbott:** 我能信任你吗?我得把这个收回到内部来做。
**Mike Abbott:** Can I trust you? I got to bring this in house.
**Anjney Midha:** 对。
**Anjney Midha:** Yeah.
**Mike Abbott:** 所以,如果你做这种数据相关的事情——数据本身不是……我的意思是有很多团队能产出数据,但数据越来越是关键任务数据。所以安全性就是护城河,我会这么说。不是——对于企业客户尤其如此,以及前沿实验室——你如果失去了那份信任,就很难——很难恢复。我会说那是生存级别的问题。
**Mike Abbott:** So, if you're going to like this data stuff, the data is not I mean there are many teams who can produce data but it's more and more mission critical data. So, you just have security is the moat I would say. It's not with enterprises in particular and frontier labs you if you lose that trust it's hard to it's very hard to recover. I would say that's existential.
**Anjney Midha:** 我觉得说得对。
**Anjney Midha:** I I think that's right.
**Student:** 是否有可能创建一个系统,完全且可验证地解释一个 AI?当前的方法留有盲点,而 scaling 仍在继续。
**Student:** Is it possible to create a system to fully and accurately verifiably interpret an AI entirely. Current methods leave blind spots and scaling continues regardless.
**Mike Abbott:** 我可能需要能看到这个问题。
**Mike Abbott:** I might need to be able to read question.
**Student:** [笑声]
**Student:** [laughter]
**Mike Abbott:** 嗯,就在这里——完全且可验证地解释一个 AI。可验证地完整解释。
**Mike Abbott:** Um, right here accurately verifiably interpret an AI entirely. Verifiably interpret an entirely.
**Anjney Midha:** 对,我不太确定。
**Anjney Midha:** Yeah, I'm not too sure.
**Mike Abbott:** 嗯,好的。是的,我觉得这个问题是关于模型的可解释性,比如机械可解释性——如果这就是问题的话——或者理解实际上发生了什么。
**Mike Abbott:** Um, okay. Yeah, I think it the question is about like interpretability of the model like mechanistic interpretability if that's the question or understand what's actually.
**Anjney Midha:** 对。为什么模型会做它所做的事情?是的。我觉得可解释性是 AI 研究中最令人兴奋的、最未被充分探索的领域之一,因为它最终能提高可靠性。就像我觉得微调、post-training 基本上是作为护栏的黑客手段,这些模型非常容易被越狱,对吧?
**Anjney Midha:** Yeah. Why is the model doing what it's doing? Yeah. I think I think interpretability is one of the most exciting underexplored areas in AI research because it it ultimately improves reliability. Like I I think like fine-tuning post training basically hacks uh like as guardrails. They're super easy to jailbreak these models, right?
**Mike Abbott:** 所以对模型内部发生的事情有更系统层面的理解——如何引导它、网络的哪些部分在不同任务中被激活——我认为这很有帮助。嗯,Anthropic 在这个领域发表了很多博客文章。
**Mike Abbott:** And so a more systems level understanding of like what's going on in the model, which how to steer it, which you know parts of the network are activating for different tasks, I think is quite helpful. Um, anthropics had a number of blog posts in this space.
**Anjney Midha:** 他们在这方面做了大量研究,所以如果你对这个领域感兴趣,我建议去看一看。
**Anjney Midha:** They've done a lot of research on so if you're interested in that area, I'd recommend go a look.
**Mike Abbott:** 对。所以,机械可解释性真的很令人兴奋。还有很多其他的可解释性技术。我对此非常看好,你知道。我觉得做可解释性研究的一个挑战当然是你需要访问模型权重才能做好。
**Mike Abbott:** Yeah. So, Meck is really exciting. There's a bunch of other techniques for interpretability. I'm very bullish on it, you know. Um, I I I think one of the challenges of course in doing interpretability research is that you need access to the model weights to do it well.
**Anjney Midha:** 所以,如果你不在某个实验室,除非是用开源模型来做,否则很难做到。这也是另一个地方——我觉得如果我们的目标是让模型对所有人来说都更可靠,那么拥有真正好的前沿开源基础模型,让大学和学术研究人员能够审视并做开放模型研究,是非常有帮助的。
**Anjney Midha:** And so, it's kind of hard to do it if you're not at one of the labs unless you're doing it on with open models. And this is another place where I think it's if if if our goal is to make models reliable more broadly for everybody then having really good frontier open base models so that universities academic researchers can introspect and do open model research is quite helpful.
**Mike Abbott:** 嗯,或者我希望看到的是某种全行业的机构,让闭源提供商在更广泛发布之前让可解释性研究人员容易地访问他们的模型。这样你就可以做大量可解释性研究,测试它,发布结果,然后我们一起变得更安全。
**Mike Abbott:** Um or what what I would love to see is some kind of industrywide body where um closed source providers make their models easily accessible to interpretability researchers before a broader release. So then you can like do a bunch of interpretability research, test it, you know, publish results and then we all get safer together.
**Anjney Midha:** 嗯,但是是的,我真的觉得在关键任务场景中部署这些东西会好很多——如果我们破解了可解释性,我会感觉好很多。是的。是的。是的。我觉得说得对。呃,Windsor 收购事件表明实验室会在一夜之间切断 API 访问。如果随时都可能被撤销,开发者应该如何在基础模型之上构建?
**Anjney Midha:** Um, but yeah, I I I I do think deploying these things in mission critical context would be a lot I'd feel a lot better if we had cracked interpret. Yeah. Yeah. Yeah. I think that's right. Uh, the Windsor acquisition showed labs will cut off API access overnight. How should developers build on top of foundation models if the rug can be pulled at any time?
**Mike Abbott:** 是的。是的,这是个好问题。嗯,这有一个好答案。
**Mike Abbott:** Yeah. Yeah, that's a good question. Um, there's a good answer to that.
**Anjney Midha:** 是的,基本上如果你依赖某一个人的基础设施,而那个基础设施挂掉了,这有点像多云的情况。我的意思是它回到了同样的问题,我认为答案是你必须——你必须能够支持多个模型。
**Anjney Midha:** Yeah, B I mean basically if you're dependent on one person for your infrastructure and that infrastructure goes out, it's a little bit like multicloud. I mean it comes back to that same piece and I think the answer is you have to you have to be able to support multiple models.
**Mike Abbott:** 是的。
**Mike Abbott:** Yeah.
**Anjney Midha:** 你知道,理解哪些模型在某些任务上最有效。我见过很多初创公司,他们根据活动的不同使用模型集成,对吧?我相信你也看到了同样的情况。
**Anjney Midha:** You know, and understand what models are most effective at certain tasks. The number of startups I've seen are they use like a an ensemble of models depending on what the activity is, right? I'm sure you see the same thing.
**Mike Abbott:** 是的。所以实际上确实如此——你可以为自己的公司拥有一个编程模型。
**Mike Abbott:** Yes. So that's actually right where you can have a coding model for your own company.
**Anjney Midha:** 嗯哼。
**Anjney Midha:** Mhm.
**Mike Abbott:** 在那里你用自己的数据——专门为你的代码库、你的风格偏好等等量身定制——更重要的是有时出于安全考虑。嗯哼。
**Mike Abbott:** Where you take an open on your data that's specifically customtailored for your code base, your sort of stylistic preferences and so and more importantly sometimes for your security context. Mhm.
**Anjney Midha:** 嗯,这是个很好的观点。而且取决于你在哪个垂直领域,比如如果你在高度监管的领域,你必须——你不能使用云。
**Anjney Midha:** Um, that's a really good point. And depending and also what vertical you're in, like if you're in a highly regulated space, you have to you have to you can't use like a you can't use the cloud.
**Mike Abbott:** 是的。
**Mike Abbott:** Yeah.
**Anjney Midha:** 我的意思是,如果你在运行国防工作负载,
**Anjney Midha:** I mean, if if you're running a defense workload,
**Mike Abbott:** 是的。
**Mike Abbott:** Yep.
**Anjney Midha:** 它必须在弗吉尼亚州的服务器上运行,完全由政府管理,完全气隔。
**Anjney Midha:** it's got to be running on a server in Virginia, whatever that's fully managed by the government and completely airgapped.
**Mike Abbott:** 完全气隔。是的。
**Mike Abbott:** Completely airgapped. Yeah.
**Anjney Midha:** 所以,嗯,
**Anjney Midha:** And so, um,
**Mike Abbott:** 我觉得这是一个大市场。还有很多提供商——这仍然是一个不断发展的领域,但没有真正好的——我觉得 MR 在这方面已经搞清楚了,
**Mike Abbott:** I think there's a big market for that. There's lots of prov like it's still an evolving space but there's no really good I think MR has figured this out in
**Anjney Midha:** 我觉得 MR 感觉像是开创了一条路。
**Anjney Midha:** I say MR feels like they've pioneered a way.
**Mike Abbott:** 对,在美国我觉得
**Mike Abbott:** Yeah in the US I think
**Anjney Midha:** 我觉得没有人——还不存在。
**Anjney Midha:** I don't think anyone doesn't exist yet.
**Mike Abbott:** 是的。[哼了一声] 让我们看看——你们怎么看 Meta 的裁员和每月 6 万亿 Claude token 使用量?
**Mike Abbott:** Yeah. [snorts] Let's see um what do you think of Meta's workforce reduction and 6 trillion a month claude token usage?
**Anjney Midha:** 六万亿月度 Claude token。我明白了。我还没关注这件事发生了什么。他们有一个 token 使用排行榜,两天后就下线了。为什么下线了?
**Anjney Midha:** six trillion month cla token. I see. I haven't paid attention what's going on. They had a they had a leaderboard for token use and they took it down after two days. And why did they take it down?
**Mike Abbott:** 我不确定。他们没有解释原因。我不知道他们是否不想透露自己烧了多少。我不确定。谁知道呢?
**Mike Abbott:** I I'm not sure. They didn't they didn't explain why. I don't know if they didn't want to share how much they're burning. I'm not sure. Who knows?
**Anjney Midha:** 好的。是的。嗯,唯一有点有趣的是,Meta 的领导层其实都不在排行榜上。全都是工程师。
**Anjney Midha:** Okay. Yeah. Um don't they sorry don't have the only thing that was kind of interesting from that was that uh none of the leaders at Meta were really on the board. It was all like engineers.
**Mike Abbott:** 哦,好的。这说得通。
**Mike Abbott:** Oh, okay. That makes sense.
**Anjney Midha:** 有点说得通。但人们对此有些嘲讽。嗯,我们是否正在走向两个独立的 AI 基础设施生态系统,美国和中国?这对行业意味着什么?
**Anjney Midha:** It kind of makes sense. But people were kind of poking at that a little bit. Um, are we heading toward two separate AI infrastructure ecosystems, US and China? And what does that mean for the industry?
**Mike Abbott:** 这——一直如此——这在很多行业中都是这样。
**Mike Abbott:** Has that's all has that's been the case for like many industries.
**Anjney Midha:** 12 年了。不,基础设施生态系统——中国的基础设施生态系统与美国已经完全不同了,大概有 15 年了。
**Anjney Midha:** 12 years. No, infra the infra ecosystem in China has been completely different from us for for like 15 years now.
**Mike Abbott:** 至少 15 年了。
**Mike Abbott:** At least 15 years.
**Anjney Midha:** 是的。自从防火长城建立之后。
**Anjney Midha:** Yeah. Once the great firewall was put out.
**Mike Abbott:** 是的。所以我不认为这是新鲜事。嗯,如果转变是走向可信合作关系,那么当技术生态系统通常奖励快速创新时,信任如何建立?当技术生态系统通常奖励快速创新时,信任如何建立?嗯,我觉得因为信任是赢得的,所以需要时间。
**Mike Abbott:** Yeah. So I don't think that's new. Um, if the shift is toward trusted partnerships, how can trust be built when technology ecosystems generally reward rapid innovation? How can trust be built when technology ecosystems generally reward rapid innovation? Well, I think because trust is earned, so it takes time.
**Anjney Midha:** 是的。嗯,扩展信任的方式是通过开放标准,对吧?这就是我们在课堂上讨论的内容。是的。比如这里是协议。这里是开放标准。TCP/IP。
**Anjney Midha:** Yeah. Well, the way to scale trust is through open standards, right? That that that's what we're talking about in the class. Yeah. Like here's the protocol. Here's an open standard. TCP IP.
**Mike Abbott:** 是的。
**Mike Abbott:** Yeah.
**Anjney Midha:** 浏览器,AC/DC,这些是标准。SOC 2 合规。
**Anjney Midha:** Browser, AC/DC, here are the standards. Sock to compliance.
**Mike Abbott:** 这些都是标准。然后你发布它们。只要提供商遵守这些标准,使用该模型的人就可以信任该模型是合规的。嗯哼。
**Mike Abbott:** These are standards. And then you publish them. And then as long as they the the provider aderes to those standards, then whoever's consuming the model can trust that the model is compliant. Mhm.
**Anjney Midha:** 所以我觉得这就是你允许创新发生的方式——标准越开放、越可访问越好,然后你让创新者专注于重要的事情,即在某个新领域推动前沿。
**Anjney Midha:** So I I I think that's how you allow innovation to happen is the more those standards are open and accessible the better and then you you let the people who are innovating focus on the things that matter which is pushing the frontier in some new area.
**Mike Abbott:** 你知道,在某个时候,SOC 2 对软件初创公司来说真的很难做,现在有了 Vanta 这样的东西,以及所有其他做 SOC 2 即服务的东西,现在很多初创公司可以很快获得认证。我觉得这就是会发生的事情。但你确实需要标准。这就是问题所在。我们现在还没有标准。
**Mike Abbott:** Y you know I I at some point you know sock 2 was really hard to do for software startups and now there's this thing called Vanta and all these other things that do sock 2 as a service and then now many of the startups can just get certification really fast. I think that that's what'll happen. But you do need you need standards. That's the problem. We don't have standards right now.
**Student:** 你们对哪些空间感到兴奋,希望看到更多人在这些领域创业?
**Student:** What spaces are you guys excited to see more people found in?
**Anjney Midha:** 我来接这个,我来。我是这么想的。
**Anjney Midha:** I can take I'll take this one. I think so.
**Anjney Midha:** 最近我一直在花时间研究那些历史上不一定非常技术性的领域。比如再保险。我一直在研究那里的机会。这周我和 Merck 的 CIO 见了面,只是在头脑风暴——除了药物发现之外,实验室内部还有哪些独特的机会。
**Anjney Midha:** I've been spending time lately with domains that are not necessarily historically super technical. So like reinsurance. I've been looking at what are the opportunities there. This this week I met with the CIO of Merc and just kind of brainstorming like what what are the unique opportunities there besides like drug discovery like within the lab.
**Mike Abbott:** 嗯
**Mike Abbott:** Y
**Anjney Midha:** 我觉得,你知道,我鼓励大家去研究那些有点偏门的领域。
**Anjney Midha:** I think you know I'd encourage folks to like look at these domains that are kind of out in left field.
**Mike Abbott:** 是的。[笑声]
**Mike Abbott:** Yes. [laughter]
**Anjney Midha:** 是的。
**Anjney Midha:** Yes.
**Mike Abbott:** 你知道那些历史上也许有技术,但没有像工程师那样的人员的领域。所以,建立一个 AI 核保员意味着什么——你把所有人工判断都拿走?
**Mike Abbott:** That you know let's say historically have they maybe have had tech but they haven't had like engineering like staff. So like what does it mean to build an AI underwriter where you you're taking all human judgment out?
**Anjney Midha:** 是的。
**Anjney Midha:** Yep.
**Mike Abbott:** 嗯,是的,那实际上是一个很大的问题——如何做到风险评估是一个非常大的领域,肯定。
**Mike Abbott:** Um yeah, that that's that's a huge one actually is how do you do so risk assessment is a really big one for sure.
**Anjney Midha:** 显然他们历史上使用过模型,但那已经是老东西了。是的。是的。
**Anjney Midha:** And obviously they've used models historically but you know that's old. Yes. Yes.
**Mike Abbott:** 我觉得关键是现在有更多数据集可以使用。
**Mike Abbott:** And I think the point is like there's now more data sets you could use.
**Anjney Midha:** 是的。
**Anjney Midha:** Yeah.
**Mike Abbott:** 去做那些精算模型。
**Mike Abbott:** To go do those actuarial actuarial models.
**Anjney Midha:** 是的。
**Anjney Midha:** Yes.
**Mike Abbott:** 嗯,我觉得这很有意思。
**Mike Abbott:** Um which is pretty interesting I think.
**Anjney Midha:** 我觉得我们有很多历史数据可以用来做回测验证。所以那是可验证的。是的。所以那是一个。嗯,我想说在我们的课上,我们有——我们昨天和 Andy 讨论过,你可能记得——关于一个好的实时视频模型可以像多模态视频模型一样用于计算机使用、机器人技术和物理理解。所以我非常兴奋的一个领域是如何使用这些视频模型——有一些好的开源视频模型即将到来——如果你把那些只是好的多模态视频模型,把它们变成动作预测,用它们作为动作预测引擎,那么它就不只是生成下一帧视频像素,而是生成下一个动作,对吧?那个动作可以是你要在计算机上按下的键,可以是机器人的下一个手臂动作,而动作是一个相当通用的东西。所以如果你用下一个动作取代下一个 token 或下一个视频帧,那是一个超级强大的基本单元。
**Anjney Midha:** And I think we have a lot of historical data to you can run the back test with verification. So that's verifiable. Yeah. So that that is one. Um I would say in the class that we have so we had uh I was quite we talked about yesterday with Andy if you remember about the a good real time video model being usable like a multimodal video model being usable for computer use for robotics for physical understanding. So I I think one area I'm definitely very excited about is how can you take and and and there's a few good open video models coming. If you take those video models that that are um you know just good multimodal video models and you turn them into like action prediction you use them as action prediction engines then it's not just generating the next frame like the next video pixels but it's generating the next action right and that action can be the keyboard strikes that you're you know this the keys you're going to press on your computer it can be the next arm movement for a botics and and an action is a pretty general purpose thing. So if you replace next token or next video frame with next action that is a super powerful primitive
**Mike Abbott:** 我一直在想,自动驾驶领域的模型可能是最接近的——自动驾驶、计算机使用——这关于互动很有意思,因为如果你想想 Andy 的讲座,他试图说的是,老世界是文本到图像,对吧——提示进去,图像出来——但当你能够改变架构,使模型可以生成一个动作,然后你可以采取那个动作并将其输入到下一个阶段,你就得到了这种递归——这种递归能力——那是如此强大,因为你可以说好,哪里有对持续自动化的需求,对,带有一个动作,以及
**Mike Abbott:** I was thinking probably the models in autonomous driving are closest autonomous driving computer use and and this is interesting about interaction because the way that if you think about Andy's lecture what he was trying to say is you know the old world was text to image right prompt in image out but when when you can when you change the architecture such that you can condition you that the model can generate an action and then you can take that action and pipe it back in to the next stage. you get this recursion, you get recursive uh this recursive ability and that is so powerful because you could say okay where is there like a a need for continuous automation y with with an action right and
**Anjney Midha:** 我觉得这就是为什么机器人技术正在经历这场巨大革命——工业自动化,只要你有一条工厂生产线,你有持续的数据需要驱动某些机器人运动或某些变化。这些动作预测模型可以非常有效地桥接软件和物理。
**Anjney Midha:** I think that that is why robotics is so is is going through this huge revolution industrial automation wherever you've got a factory line that you have continuous data that needs to power some robotic movement or some change it. These action prediction models can bridge software and and physical very efficiently.
**Mike Abbott:** 嗯哼。
**Mike Abbott:** Mhm.
**Anjney Midha:** 以过去有时需要数年才能自动化的方式,你将能够在几周到几个月内用这些动作预测模型完成。对此非常看好。我认为好消息是会有各种开源模型。显然 BFL 在那个领域有重大投资即将到来,我认为很多研究人员应该拿他们要发布的东西,尝试为动作预测定制它。嗯,想必会有特定领域的动作。
**Anjney Midha:** In ways that used to take like sometimes years to automate before you you will be able to do in weeks and months with these action prediction models. Very bullish on that. And I and I think the good news is there will be a variety of open models. Obviously BFL has a major investment in that area coming and I and I think a lot of researchers should take their what what they're going to put out and try to customize it for action prediction. Um, and presumably there'll be actions like by particular domain.
**Mike Abbott:** 是的。嗯,所以我不认为那些动作会立即泛化,而不需要针对特定实施的最后一公里定制。比如机器人有不同的实施方式——你可能有双足、单轮等等,最后一公里集成需要一点人工工作。这就是为什么有很多部署工作要做。但如果你能连接——如果你有一个通用动作预测模型,你可以发挥创意,想办法把它连接到物理系统中。
**Mike Abbott:** Yes. Um, so I don't know that those actions will generalize immediately without a little bit of last mile customization for a particular embodiment. For example, like robots have different embodiment. You might have a biped uni whatever that last mile integration takes a little bit of work, human work. And that's why there's a lot of deployment magic to do. But if you can hook up if you have if you have a general purpose action prediction model and you can do the you can be creative about how to hook it up in a physical system.
**Anjney Midha:** 嗯,那里的回报是巨大的。
**Anjney Midha:** Um the returns there are massive.
**Mike Abbott:** 是的。
**Mike Abbott:** Yeah.
**Anjney Midha:** 让我们看看。呃
**Anjney Midha:** Let's see. Uh
**Anjney Midha:** 我有一个问题给你。
**Anjney Midha:** I've got one for you.
**Mike Abbott:** 什么?
**Mike Abbott:** What?
**Student:** Mike,你曾在 Microsoft、Apple、Twitter 和 GM 做过高管。在 AI native 公司做技术领导和在传统机构有什么不同?
**Student:** Mike, you've been an exec at Microsoft, Apple, Twitter, and GM. How different is technical leadership at an AI native company versus a legacy institution?
**Mike Abbott:** 天哪。我会说嗯,非常不同。我会说比如我在 Apple 的时候,你知道,我们显然专注于构建产品,对吧?嗯,每个人都非常技术化。我的意思是,我和 Craig 密切合作过。Craig Federighi 是非常深入技术的。每个人都在非常细节的层面上理解事物是如何运作的,对吧?我们需要构建什么?嗯,当我去了 General Motors,和 Mary Ba——Mary Barra 一起工作——我会说你知道,她理解尤其是在电动车领域基本上就是电池和软件,她需要引进那些人才。所以实际上现在我们在 Mountain View 有大约 300 人在为 General Motors 工作。我觉得更有意思的是在 AI 这方面——你知道我觉得 Apple 因为种种原因错过了 AI。嗯,但我 [清了清嗓子] 觉得很多 CEO 像 Mary,你知道,甚至在董事会层面,他们都在讨论,你在 AI 方面做了什么?她知道要吸引对的人对她来说很难。没有那些对的人,就很难搞清楚什么是对的应用场景去攻克。所以我觉得这就是很多初创公司的机会——可以被创建出来与像 General Motors 这样的公司合作,去解决他们的特定问题,对吧?希望能泛化到其他领域,如果你理解我的意思的话。
**Mike Abbott:** Oh boy. I'd say um very different. I'd say like when I was at Apple, you know, we were obviously focusing on building products, right? Um and everyone was very technical. I mean, I worked closely with like Craig. Craig Federigi is like deeply technical. Everyone really understands at a very detailed level like what how things work, right? What do we need to build? And um when I went to General Motors working at Mary with Mary Ba like I'd say that you know she understood that like the world especially with EVs are basically batteries and software and she needed to bring in that talent and so actually now we've got like 300 people in Mountain View working for General Motors. I think it's more what's interesting though on the AI side like you know I think Apple missed AI for a variety of reasons. Um, but I [clears throat] think a lot of CEOs like Mary, you know, even coming at the board level, they're having conversations on, you know, what are you doing in AI? And she knows that it's hard for her to attract the right people. Without those right people, it's hard to figure out what is the right use case to go tackle. And so, I think this is where there's an opportunity for a lot of startups that could get created to work with a company like General Motors to go solve their particular problem, right? that hopefully generalizes to other domains if that makes sense.
**Anjney Midha:** 是的。这就是为什么我觉得前置部署工程师(forward deployed engineer)的兴起——不是兴起而是说——
**Anjney Midha:** Yes. That's why I think this rise of the forward deployed engineer — not rise but like —
**Mike Abbott:** 我觉得大部分成功的 AI 部署都有很重的服务组件。
**Mike Abbott:** I think most of these successful AI deployments — yeah — have a heavily services component.
**Anjney Midha:** 是的。所以我一直——我同意你的说法。我觉得 forward deployed engineering 这个术语——那是我们十年前、十五年前的说法——当时我大学毕业 Palantir 正在起步。我觉得我观察到的区别是现在变成了前置部署研究(forward deployed research)。因为做集成的人实际上需要对机器学习 pipeline 有相当扎实的理解——比如如何构建正确的 eval,如何用正确的数据集做 RL,如何把 representation 做对——所以这更像是一种前沿部署式的研究,而不是传统的工程。
**Anjney Midha:** I — yeah. So I've been — I agree with you. I've — I think the update to the forward — like forward deployed engineering was the term of our 10 years ago 15 years ago years ago when I was graduating from college and Palantir was getting going. I think the difference I've observed is now it's forward deployed research. Because the people doing the integration actually need a fairly robust understanding machine learning pipelines like how do you construct right eval how do you do RL on with the right data sets how do you get the representation right and so it's a form of forward deployed research more than it is traditional engineering.
**Mike Abbott:** 嗯,我其实很喜欢这个说法,我觉得你说得对。我要偷走这个说法。
**Mike Abbott:** Yeah, actually I like that. I think that's right. I'm going to steal that.
**Anjney Midha:** 随便拿去用。[笑声]
**Anjney Midha:** Sure, you can have it. [laughter]
**Student:** 你们俩都见过公司失败的情况。你们最常见到的、大家又不怎么坦诚讨论的领导力失败模式是什么?
**Student:** You've both seen companies fail. What's the leadership failure mode that you see the most often that people don't talk about honestly?
**Mike Abbott:** 天哪。
**Mike Abbott:** Oh boy.
**Anjney Midha:** 哦,对我来说这个太简单了。是文化。
**Anjney Midha:** Oh, that's an easy one for me. It's culture.
**Mike Abbott:** 是啊,我觉得文化这东西太容易搞砸了。一旦搞砸了,就很难恢复,因为一旦人们失去了信任——你知道的,公司的领导层说要做一件事却没有做到。嗯,你能从团队那里得到几次宽容,你知道的,一两次还行,但你说了要做某件事,然后不管什么原因没做到,或者偏离了计划,呃,你的团队就会对你失去信任,对吧?这就是为什么我觉得使命对齐如此关键——在每一个你有机会向团队展示的时刻,嘿,这就是真正重要的使命。我们会做出相当艰难的取舍。我们可能不会有最高的薪酬包。我们甚至可能没有最好的福利等等,但这是真正重要的使命。我们有最好的团队。而这是我们独特的配方或者观点,说明我们为什么能赢。
**Mike Abbott:** Yeah, I think culture is like it's so easy to mess up culture. Once it's messed up, it's so hard to recover because because once humans lose trust that, you know, the leadership of a company says it's going to do one thing and it doesn't. You get a few passes, you know, a couple of times from your team, but the more times you you like say you're going to do something and then for whatever reason you didn't or there's a deviation from plan, uh you lo your team loses trust in you, right? And that's why I think the mission alignment is so critical where at every opportunity where you have a chance to show your team like, hey, this this is the mission that matters. We're going to make pretty hard tradeoffs. We may not have the the biggest pay packages. We may not even have the best benefits and so on, but this is the mission that matters. And we've got the best team. And here here's our unique recipe or point of view on why we can win.
**Anjney Midha:** 而在这一路上,你会遇到那么多——真是疯了——那么多分散注意力的诱惑。很多时候你需要——我把公司比作一次公路旅行,对吧?你有一个——你大致有一个目的地在心里,然后你有一辆车,如果你是创始人你就坐在驾驶座上,你把朋友们叫上车说,伙计们我们去那里。有时候你会迷路。你得走一些弯路和岔道。但只要所有人都觉得:"这很有趣,因为我们在一起,我们在学习,我们在一路成长,你到达目的地,一切都很好。"一旦乘客开始说,为什么?我们为什么要走这条弯路?呃,你知道的,我的妻子 Vivian,呃,我们大学时经常一起公路旅行,我开车的时候会绕一些弯——你知道的,我就是暂时不跟 Google Maps 走了,因为我们在 Napa 还是哪里,风景很好。我就说:"走吧,欣赏一下乡村风光。"她是一个非常使命驱动的司机。她会说:"我们为什么不走最高效的路线?"我不得不向她解释,享受乡村风光本身就是一种美好。然后一旦她信任了我最终会回到高速公路上,她就接受了。她会说:"好吧,好吧。这又是 Anjney 的一次蜿蜒绕路,但我们会回到正轨的。"然后我们就规划一个稍长一点的行程。而且我们在没有好风景的时候也不会绕路。我喜欢风景。我喜欢好看的风景。嗯,所以只要大家对目的地是什么、什么情况下可以偏离路线达成一致,我觉得就没问题。
**Anjney Midha:** And and along the way, you have so many like it's so crazy that how many um distracting offers happen along the way. Often you need to do a — like take a — I think about a company as a road trip, right? You you've got your you've got kind of a clear you kind of have a destination in mind and then you have a car and you're in the driver's seat if you're the founder and you you get your friends together in the car and say we're going there, guys. And sometimes you get lost. You got to take some detours and turns. But as long as everyone's like, "This is fun because we're all in it together and we're learning and we're improving along the way," you get to your destination, everything is great. The minute the passengers start going, why is this — why why are we taking this detour? Uh, you know, my wife Vivian, um, we we we used to go on road trips together in college and I'd be driving and I'd take a meandering — like a — you know, I I just like stopped following the Google Maps for a bit cuz we were in Napa or whatever and it was just scenic. I'd be like, "Let's go enjoy the countryside." And she's very uh she's a very mission-driven driver. She's like, "Why are we not taking the most efficient path there?" And I had to explain to her that like there's there's beauty in just enjoying the countryside. And then once she trusted that like I would eventually come back to the highway and she was like okay. And she's like, "Okay, fine. Anything is one of his like meandering detours, but we're going to get back on track." And then we just planned for a slightly longer road trip. And we also don't take detours when it's not worth taking because there's no scenic view. I love views. I love scenic views. Um, so as long as everyone's aligned on what the destination is and what are the terms of the protocol for when to deviate then I think you —
**Mike Abbott:** 我觉得我想补充一点,就是有些很实际的东西——比如如果你是一家企业软件公司,你解决的问题是止痛药还是维生素?因为我见过一些公司,你知道的,创始人非常有激情,文化也很一致,但如果他们做的东西对客户来说不是必需品,那你就会死掉。第三点我要说的是钱的问题。你作为一家公司花钱的时候,是不是像花自己的钱一样?
**Mike Abbott:** I think I would add to that though that like there are like real things around like if you're an enterprise company are you solving — is it — are you a painkiller or a vitamin? Because I've seen companies — yes — that like you know the founders are really passionate you know the culture is all aligned but let's say what they're building is not a must-have for a company then you will die. And then the third I'd say is around just dollars. Like do you spend money as a company as if it was your own money?
**Anjney Midha:** 完全是。完全是。
**Anjney Midha:** Totally. Totally.
**Mike Abbott:** 因为我经历过——我在 99 年参与了一家公司,2001 年我不得不裁掉 180 人,那时公司一共才 220 人。
**Mike Abbott:** Because like I saw this and I was involved with a company in 99 and I had to lay off 100 people — 180 people out of 220 — in 2001.
**Anjney Midha:** 是啊。
**Anjney Midha:** Yeah.
**Mike Abbott:** 那太残酷了,原因就是从第一天起就没有财务纪律。在公司成立三年之后再去加入财务纪律,非常非常困难。
**Mike Abbott:** And it was brutal and it was because there was no fiscal discipline from day zero. It's very hard to add fiscal discipline like after like three years into a company. It's very difficult.
**Anjney Midha:** 不,你做不到。这些都是单向门,就像信任和文化一样。一旦你搞砸了就没法恢复——
**Anjney Midha:** No, you can't. These are one way doors like trust culture. You don't get to recover those if you make uh the —
**Mike Abbott:** 我的意思是,可以说财务纪律就是文化的一部分。就是说你公司里的每个人花钱的时候,是不是都当作自己的钱在花?而且——而且——你看,我觉得现在有越来越多的方式来实现这种对齐。十年前创业公司刚起步的时候,我们能看到钱花在哪里的工具非常非常少。
**Mike Abbott:** I mean arguably the the fiscal discipline is a part of culture. It's like does everyone in your company act like it's you know their own dollars that's being spent and and and look I I think there's more and more ways to align that today. Like 10 years ago when startups were just taking off the tools we had to do to have visibility into where people are spending and so on that minimal super minimal.
**Anjney Midha:** 我是说,考虑到当时公司建设的工具有多原始,Google 居然还能做起来,真是令人惊叹。是的,因为工具太差,Google 内部也有很多不好的行为,人们乱花钱。我是说你还记得我们刚开始一起做风投的时候,有些公司的开支完全失控——
**Anjney Midha:** I mean it's actually remarkable given how primitive all the tools were for company building that Google ever got anywhere. Now there was a lot of bad behavior inside of Google and so on as a result because people were spending — I mean you remember when we started working on venture together the spend was out of control at some of the —
**Mike Abbott:** 哦是的,某些条目上。
**Mike Abbott:** Oh yeah, on some of the line items.
**Anjney Midha:** 我在 Twitter 也见过这种情况——花在一些对我来说毫无意义的事情上。呃,我觉得你应该把钱投资在人身上——钱就应该花在这儿。
**Anjney Midha:** I saw this even at Twitter — the spending on things that like didn't make sense to me. Uh I think you you know you invest in people like that's where dollars go.
**Mike Abbott:** 是的。
**Mike Abbott:** Yeah.
**Anjney Midha:** 钱花在其他东西上。
**Anjney Midha:** Dollars other things.
**Mike Abbott:** 嗯,人和算力。
**Mike Abbott:** Well people compute.
**Anjney Midha:** 对。算力。[笑声] 算了,当然了。
**Anjney Midha:** Yeah. Compute. [laughter] Never mind. Of course.
**Mike Abbott:** 花在算力上。是的。你知道我学到了什么吗?在人身上的花费不会规模化增长太多,因为那会伤害文化。就像我觉得我们现在正在经历的——大团队——大团队是很难管理的。我是说,回到之前那个问题,我在 General Motors 管 19,000 人。那是一个疯狂的数字。
**Mike Abbott:** Spend on compute. Yeah. You know that that what I've learned is yeah spending on people doesn't scale that much because it it hurts culture. Like I think what we're going through right now — big big yeah big teams are hard. I mean actually you know back even to the other question I had 19,000 people at General Motors. That's a crazy number of people.
**Anjney Midha:** 是啊。
**Anjney Midha:** Yeah.
**Mike Abbott:** 你知道的,我在 Apple 管 2,000 人。那也挺多的。但是如果我看我负责 iMessage 的团队,才 25 个人。很小的。这也是我们当前裁员潮的问题之一——很多现在发生的裁员被归咎于 AI。
**Mike Abbott:** You know, I had 2,000 people at Apple. I mean, that was big. But like, you know, if I look at my team like that ran iMessage, it was like 25 people. I mean, it was tiny. This is also one of the issues with the layoff era we're in right now where a lot of the layoffs that are happening now are being blamed on AI.
**Anjney Midha:** 不是的。
**Anjney Midha:** It's not.
**Mike Abbott:** 不,那只是零利率时代过度招聘的后果。有很多高管会说:"我不想承认——"
**Mike Abbott:** No, it was just overhiring during the zero-interest-rate era. And there's a lot of execs who are like, I don't want to admit —
**Anjney Midha:** 对。
**Anjney Midha:** Yep.
**Mike Abbott:** "——我们搞砸了。"
**Mike Abbott:** That we messed up.
**Anjney Midha:** 对。但我们这些知道当时发生了什么的人都知道——我们那时候就在说,你不需要这些人,这个什么什么 VP 不需要 2,000 人的编制。是啊。
**Anjney Midha:** Yep. But all those of us who knew what was happening knew we were like — you do not need this — this VP of whatever does not need 2,000 people headcount. Yeah.
**Mike Abbott:** 但你就是想能说——或者说那些钱。我记得有几个我想从 Apple 挖的人,他们从 Meta 拿到了差不多两倍于我们报价的 offer。当然 Apple 本来就抠门,对吧?但还是——那个时代太疯狂了。所以我会跟那些人说,你应该去,如果那是能改变人生的钱的话——是的,如果你认同那边的使命的话。
**Mike Abbott:** But you just want to be able to say — or like the dollars. I can remember a couple people uh that I was trying to hire at Apple who would get offers from Meta at like 2x like what we were offering. Now granted Apple's cheap, right? But still like it was such a crazy era. And so I would tell those people you should go like if that's life-changing money — like yeah — if you're aligned with the mission.
**Anjney Midha:** 是啊。就是我们的合伙人 John Doerr 以前常说的,你知道的,雇佣兵还是传教士,你想当哪个?
**Anjney Midha:** Yeah. It's that whole — our partner John Doerr used to say, you know, mercenary or missionary, like where do you want to sit?
**Mike Abbott:** 是的。
**Mike Abbott:** Yeah.
**Anjney Midha:** 对。
**Anjney Midha:** Right.
**Mike Abbott:** 呃,这个问题很长。
**Mike Abbott:** Uh this is a long one.
**Anjney Midha:** 好吧。
**Anjney Midha:** Okay.
**Student:** 呃,我们看第 23 题。随着 LLM 性能趋同、原始权重成为大宗商品,主要的经济护城河是否正在转向 harness 工程——也就是 agent 编排、工具集成和状态管理层——这些把模型变成高价值产品的部分?
**Student:** We're on 23. As LLM performance converges and raw weights become a commodity, is the primary economic moat shifting toward harness engineering, the agentic orchestration, dual integration, and state management layers that turn models into high value products?
**Anjney Midha:** 我觉得这又是那个问题的另一种问法。
**Anjney Midha:** I think this is a re — another attempt at that question.
**Mike Abbott:** 好,这个——好吧,我要暂停一下,稍微发个牢骚。这个护城河的话题不断出现,因为我觉得风险投资人把"护城河"这个词完全用滥了。世上没有护城河。没有技术护城河。嗯。我觉得有优势,有先发优势,但护城河是文化、信任、关系。是企业信任你能交付真正的价值。
**Mike Abbott:** Okay, I — this okay, I'm going to just pause for a second and rant a little bit about this this moat thing. This moat thing keeps coming up because I think venture capitalists have completely like overused the term moat. There are no moats. There are no technical moats. I think there are advantages, there are head starts, but the moats are culture, they're trust, they're relationships. It's it's enterprises trusting you to deliver real value.
**Anjney Midha:** 也许还有领域专业知识。
**Anjney Midha:** Maybe domain expertise.
**Mike Abbott:** 领域专业知识。比如——这场争论,我们已经在这个领域四年了各位。别再争论什么是"主要经济模式"了。有先发优势和竞争优势,然后护城河来自于——你在构建一种优秀的执行文化,不断地创新、一次又一次地创新,你的供应商信任你,你的合作伙伴信任你,你的银行以更低的资金成本借钱给你。当你——你知道我看 Visa 和 Mastercard 的例子,Visa 就是一个非常好的例子,我们讨论过这个对吧?Visa 是世界上最成功的企业之一。它的护城河是什么?就是信任。他们不拥有银行。他们什么都不拥有。他们不是政府机构。
**Mike Abbott:** Domain expertise. Like this debate — we're four years into this guys. Let's stop debating like what the primary economic mode is. There are head starts and competitive advantages and then the moat comes from — you're you're building a great execution culture that keeps innovating over and over again where your vendors trust you your partners trust you your banks are lending to you at a lower cost of capital when you're — you know I look to Visa and Mastercard for example — Visa is such a great example we've talked about this right — Visa is one of the world's most successful businesses. What is their moat? It's just trust. They don't own the banks. They don't own anything. They're not a government entity.
**Anjney Midha:** 嗯。
**Anjney Midha:** Mhm.
**Mike Abbott:** 但他们代表信任。如果一个新市场的交易是通过 Visa 运行的,你知道那是安全的,对吧?
**Mike Abbott:** But they stand for trust. Well, if a transaction in a new market is running through Visa, you know that's secure, right?
**Anjney Midha:** 而如果他们出了数据泄露,那就完了。
**Anjney Midha:** And and if they had a breach, then it would be done.
**Mike Abbott:** 那就完了。我觉得模型也没什么不同。有些领域的模型能力会给你先发优势,但归根结底,客户付钱买的不是单一模型。他们付的是一次又一次又一次的——品牌信任,相信你会持续创新,你会保持在前沿。Claude 为什么?因为两年来,Claude 一直是编码方面最好的。如果他们只在一个季度是编码最好的,他们今天的处境会非常不同。但他们创造了一种方法来持续保持最好。有时候需要 60 到 90 天,因为他们的算力供应链积压了或者在融资。但总体而言,我们就是要做最好的编码智能提供者。
**Mike Abbott:** Then they'd be done. And I think models are no different. Like there are some areas where model capabilities get you a head start, but at the end of the day, what the customer is paying for is not a single model. They're paying for over and over and over again. The brand trust that you'll keep innovating, you'll stay at the frontier. Claude — why? Because for two years now, Claude's been the best at coding. If they had been the best at coding just for one quarter, they'd be in a very different place today. But they've created a way to repeatably say, you know, we're just going to keep staying the best. And sometimes takes us 60-90 days because our compute supply chain is backed up or we're raising some money. But by and large, we want to be the best provider of coding intelligence.
**Anjney Midha:** 这就是企业愿意为之付费的东西。
**Anjney Midha:** That's what enterprises want to pay for.
**Mike Abbott:** 是的。
**Mike Abbott:** Yeah.
**Anjney Midha:** 是可靠性。以及——
**Anjney Midha:** Is reliability and —
**Mike Abbott:** 我得说 Codex 现在也进步很大了。
**Mike Abbott:** I would say Codex has gotten a lot better though.
**Anjney Midha:** 是的。所以我觉得企业会为 Anthropic 付费——这就是多模型的事情,对吧?我觉得他们会为 Anthropic、OpenAI 和 Gemini 付费,还有谁最好就付费给谁。而且如果你不是那三四家公司之一——比如欧洲的 Mistral 之类的——就很难在编码上做到最好。但这种认为护城河是某种静态稳态的想法——我们得停止这么说了。就是——我们把这个词废除吧。
**Anjney Midha:** Yeah. So, so I think enterprises will pay for Anthropic and that's the multimodel thing, right? I think they'll pay for Anthropic and OpenAI and Gemini and whoever is the best. And it's very hard to be the best at coding if you're not one of those three or four companies like Mistral in Europe and so on. But this idea that moats are somehow static and steady state is like — we got to stop talking about that. We'll eliminate that term.
**Mike Abbott:** 好的,好的。
**Mike Abbott:** Yeah. Yeah.
**Anjney Midha:** 你想看第 24 题吗?我们还有 10 分钟。
**Anjney Midha:** Do you want to do 24? We got 10 minutes here.
**Student:** 为什么视频模型在动作预测方面比从头训练的模型有更好的先验?
**Student:** Why would video models have better priors on action prediction than something trained from scratch?
**Anjney Midha:** 嗯,视频模型就是从头训练的。我不觉得这是个假设性问题。是的。下一个。
**Anjney Midha:** Well, video models are trained from scratch. I don't think this is an if. Yeah. And or —
**Mike Abbott:** 好吧。
**Mike Abbott:** All right.
**Student:** 在技术栈上最好的位置在哪里?公司、基础设施提供商、LLM、应用层——最好的位置在哪里?
**Student:** Where is the best place to be on the stack? Company, infrastructure provider, LLM, applications — where is the best place?
**Mike Abbott:** 我不知道那会是什么。在技术栈上最好的位置在哪里?这取决于你对什么感兴趣。
**Mike Abbott:** I don't know what that is going to be. Where's the best place to be on the stack? It depends on what you're interested in.
**Anjney Midha:** 是的。你这辈子想做什么?你的使命是什么?记住,一切都回归到使命对齐。
**Anjney Midha:** Yeah. What do you want to do in life? What is your mission? Again, remember it all comes back to mission alignment.
**Mike Abbott:** 是的。你想把时间花在哪里?你在这个星球上短暂的时间里只有一种资产。没错。时间。
**Mike Abbott:** Yeah. Where do you want to spend your time? You have one asset on this short time on this planet. Time.
**Anjney Midha:** 就是这样——你说我的目标是——我想推动能源的前沿。好,那就去创建一家能源公司,解决这个 AI 时代的能源瓶颈。如果你觉得自己是一个了不起的芯片设计师,你想推动前沿,你想让墓碑上写着"这里长眠的是 Mike,最好的芯片设计师",好,我们就创办一家芯片公司。
**Anjney Midha:** And that's the thing where you say my goal is to be the — I want to push the frontier of energy. Okay, then go build an energy company solving the energy bottleneck for this AI era. If you want to be — if you see yourself as an amazing chip — you want to push the frontier, you want to be known on your tombstone as "here lies Mike the best chip designer" — fine, let's start a chip company.
**Student:** 嗯,那就是判断标准。随着 AI 模型成为大多数系统的核心依赖,你如何看待应用代码和基础设施之间的边界演变?
**Student:** Um, that's that's the heuristic. How do you see the boundary between application code and infrastructure evolving as AI models become a core dependency in most systems?
**Anjney Midha:** 我觉得这个问题很有意思,因为边界确实在模糊。
**Anjney Midha:** I think that's a pretty interesting question because it is blurring.
**Mike Abbott:** 你怎么看——我们需要 Rosie 吗?我需要再来点水。抱歉。说到你怎么看应用代码和基础设施之间的边界——随着 AI 模型成为大多数系统的核心依赖?这个边界不存在的,各位。[笑声] 好吧,让我试着稍微更耐心一点来说。有一个 harness——
**Mike Abbott:** How do you see the — we want Rosie? I need some more water. I'm sorry. Talking about how do you see the boundary between application code and infrastructure evolving as AI models become a core dependency in most systems? This boundary does not exist, guys. [laughter] Okay, let me try and and actually be a little bit less — more patient. There's a harness —
**Anjney Midha:** 嗯。
**Anjney Midha:** Mhm.
**Mike Abbott:** ——就是一个让人们可以在技术栈的不同部分插入不同模型的系统。你可以把那个叫做应用吧。我只是觉得把 app 和 infra 分开来思考对我没什么意义。
**Mike Abbott:** — which is a a system that allows people to plug in different models in different parts of the stack. And you could call that I guess an application. I I I just I think thinking — reasoning about apps versus infra doesn't make sense to me.
**Anjney Midha:** 是的。对我有意义的是按领域来思考、按任务分布来思考、按上下文反馈循环来思考——那是贯穿整个技术栈的——模型接口、应用部署——最终的 eval 是一个结果层面的 eval。
**Anjney Midha:** Yeah. What what does make sense to me is reasoning about domains, task distributions, context feedback loops that is kind of up and down the stack like model interface application deployment — like the eval is ultimately an outcome eval.
**Mike Abbott:** 嗯。而且——有意思,因为我其实对那个问题的理解稍有不同。
**Mike Abbott:** Mhm. And and so it's interesting because I actually read that question a little bit differently.
**Anjney Midha:** 你怎么理解的?
**Anjney Midha:** How did you read it?
**Mike Abbott:** 嗯,我觉得有个很有趣的现象,我在 Claude 或者 Codex 上注意到的——这些 LLM——你知道,作为一个开发者,我干了大概 25 年了——你构建抽象和 API,但是——我们当初做那些事的原因,是为了让代码对人类更可读、更容易使用。
**Mike Abbott:** Well, I think there's this interesting thing I'm noticing with like Claude or or Codex where these LLMs — like you know we as a as a developer and I've been doing this for like 25 years — like you build abstractions and APIs but like the notion — it helps — like the reason why we did that was making — to to basically enable code to be more human readable and easier to work with.
**Anjney Midha:** 是的,那个东西将要消失了。我觉得——用 LLM 的话你不需要抽象。所以它模糊了——基本上我们以前在架构上的分解可能某种程度上还会存在,但它会消失。
**Anjney Midha:** Yeah, that's kind of going to go away. I think like you don't — with an LLM like you don't need an abstraction. And so it blur — it basically it's melding — like the the decomposition that we used to have architecturally will probably still exist to some degree but it's going to go away.
**Mike Abbott:** 所以我同意最自然的做事方式就是——你不需要看代码。你只需要告诉某人去做。所以假设你可以直接告诉你的 agent 去做,然后当你想要检查工作的时候,好,你去看代码。你看 VM。所以想象一种类似 Slack 的界面,里面都是 agent,它们在写代码,而你在观察。你有一个可观测性面板,你看到:"好,给我看看你的工作,因为这里有些不对。"然后你去检查 VM 和代码——这应该跟你带一个团队没什么区别。
**Mike Abbott:** So I I I agree that the most natural way to just get work done is you don't have to have to look at code. You just tell somebody to go do it. So let's say you can just tell your agent to go do it and then when you want to double check the work okay then you go look at the code. You look at the VM. So imagine some kind of interface like Slack which is just agents and they're doing the the coding and you're observing and then you're observing. You got an observability pane and you're like okay can you show me your work because something's off here. Then you introspect the VM and the code which should be no different than if you were leading a team.
**Anjney Midha:** 没错。
**Anjney Midha:** Exactly.
**Mike Abbott:** 是的。
**Mike Abbott:** Yeah.
**Anjney Midha:** 所以这个 infra 和 app 的分法——消失了。
**Anjney Midha:** So this infra versus app thing — goes away.
**Mike Abbott:** 消失了。问题只是:你在雇用哪些 agent?你用什么工具做内省?你知道的,你有什么告警系统来说,嘿,这段代码是这个 agent 团队写的?你可能想再检查一下。而这就是我真正兴奋的解决方案空间,因为这就是人类想要与之交互的解决方案空间,对吧?接口层。嗯,但这个 infra 和 app 的分法——
**Mike Abbott:** Goes away. It's just which agents are you hiring? What tools are you using to do introspection? You know, what systems alerting do you have to say, hey, this code was written by this team of agents? You may want to double-check it, you know, and and that's that's the solution space I'm really excited about because that's the solution space humans want to interact with, right? The interface layer. Um, but this infra versus apps thing —
**Anjney Midha:** 嗯。
**Anjney Midha:** Mhm.
**Mike Abbott:** 我觉得是一种老派的思维方式。抱歉。抱歉冒犯了。
**Mike Abbott:** I I I I think is an old school way of thinking. Sorry. Sorry to insult.
**Anjney Midha:** 不不[笑声]不不,完全没有。嘿,我是老古董了,哥们。
**Anjney Midha:** No, [laughter] no, no, not at all. Hey, I'm crusty, man.
**Mike Abbott:** 不,不是的。我也发现自己比想象中白头发更多了。
**Mike Abbott:** No, that's not true. I I got — I've got more gray hair than I realized, too.
**Student:** 一个 solo founder 如何在 MVP 中设计有机的产品内分发机制,让最初的一百个用户自然地招募到下一千个用户,而不需要主动营销?
**Student:** How can a solo founder design organic in-product distribution into their MVP so that their first hundred users naturally recruit the next thousand without requiring active marketing?
**Anjney Midha:** 嗯,我是说这就是经典的网络效应。取决于产品。我是说我——你知道,但如果你能做到——比如一个基于邀请的服务,对吧?那你就有了网络效应,你就可以做产品驱动增长,也就是 PLG。
**Anjney Midha:** Well, I mean this is like the classic network effect. I mean it depends on the product. I mean I — you know — but like if you can get it like an invitation-based service, right? Then you get a network effect and you can do that — more product-led growth, or PLG.
**Mike Abbott:** 是的。一个好例子就是——比如 Arena,LM Arena,几年前是 Berkeley 的一个副项目,是一个测试应用。基本上就是一个研究生做的——就是一个研究前端,Anastasia、Wayan 和 Yan——他们来自 Yan Stoker 的实验室——在找一些算力额度。我当时在 a16z,所以我们给了他们一些开源赞助,而我观察到了一个有趣的网络效应——就是有几个实验室把预发布的 checkpoint 放到 Arena 上,想在正式发布前获取模型有多好的评估。
**Mike Abbott:** Yeah. Like a good example of this was um like Arena — LM Arena was a you know side project at Berkeley a couple years ago and it was a testing app. It was basically a grad — like it was just a you know research front end — and Anastasia and Wayan and Yan — they were out of Yan Stoker's lab — were looking for some compute credits. I was at a16z at the time and so we gave them some open source grants and what I observed was this interesting network effect that you know a few of the labs were putting up pre-release checkpoints on Arena to try to get evaluations for how good the model was before they released.
**Anjney Midha:** 这些是隐身模型。
**Anjney Midha:** These were stealth models.
**Mike Abbott:** 嗯。
**Mike Abbott:** Mhm.
**Anjney Midha:** 然后 Twitter 上就会疯传说 Arena 上有一个隐身模型。
**Anjney Midha:** And then it would go viral on Twitter that there was a stealth model on Arena.
**Mike Abbott:** 哦——然后就带来了流量——
**Mike Abbott:** Oh — then it would drive —
**Anjney Midha:** 于是人们就注册 Arena,然后他们测试模型,这就给了实验室他们需要的 eval 数据来决定这个 checkpoint 是否应该发布。事实上,给他们第一个突破机会的人之一是 Liam Fedus,他是 ChatGPT 的联合创建者,建立了 OpenAI 的 post-training 飞轮,他也是 Periodic Labs 的联合创始人——就是我们现在坐的这个地方。
**Anjney Midha:** So then people would sign up for Arena and then they would test the models and that would give the labs the the eval data they need to decide whether that checkpoint should be released or not. In fact, one of the guys who who gave them that uh the first — their first break — is Liam Fedus, who was the co-creator of ChatGPT and set up the post-training flywheel at OpenAI, and he's the co-founder of uh Periodic Labs which is where we're sitting.
**Mike Abbott:** 但那就是 post-training 作为循环这个概念开始建立飞轮的时候。是的。而那就是网络效应——一旦有人成功地在发布前在 Arena 上测试了他们的模型,下一个实验室也想这么做,然后——然后 Gemini 也想——这种复利效应——所以那就是一个例子,他们做了一些有趣的 AI 评估研究,把它作为开放项目发布出去,社区开始使用,然后网络效应就越滚越大、越滚越大。嘿,你知道吗,Arena 现在被企业用来做特定领域的安全和评估测试——编码的、图像的、还有各种定制 eval。但是,嗯,有意思的是那个项目起初只是一个副项目,一个开放社区项目。我觉得还有大量类似的事情可以做——让社区网络效应运转起来,不需要营销就能获得增长。
**Mike Abbott:** But that is when this idea of post-training as a loop began to build that flywheel. Yeah. And that's the network effect — which is once they had uh like somebody had successfully you know tested their model on Arena before release then the next lab wanted to do the same and so then you know Gemini wanted to — this compounding — and so that's an example of how they did some interesting research on AI uh evaluation they put it out there as an open project the community started using it and then this like network effect built and built and built. And hey, you know, Arena is being used by enterprises to do do in-domain safety and evaluation testing for coding and for image and all these, you know, different custom evals. But, um, interesting that project started as a — again — side project, open open community project. And I think there's tons to be done like that where you can get a community network effect going and get that growth without marketing.
**Student:** 让我看看,我们还有四分钟。AMP 如何管理闲置算力?还是说假设所有节点都会 100% 利用?
**Student:** Let's see, we got four minutes left. How does AMP manage idle compute, or is the assumption that all nodes will be 100% utilized?
**Anjney Midha:** 绝对不是假设会 100% 利用。那是目标。但你知道我的联合创始人 Sebastian 在工程这边,他在 Google 内部运营并构建了所有内部调度基础设施——就是 Borg,以及 GQM——他和 Mihi 一起设计的,Mihi 后来也从 Google 加入了我们。你知道 Google 的节点利用率——我可以说是全球最顶级的——在 95% 以上。而独立生态系统中的节点利用率,特别是单租户集群,大概只有 50%。50%。是的。60%。是的。
**Anjney Midha:** It is absolutely not the assumption that it'll be 100% utilized. That's the goal. But you know my co-founder Sebastian on the engineering side ran the scheduler built all the internal scheduling infrastructure at Google — which is the Borg export GQM which he co-designed with Mihi who joined us as well from Google — and you know node utilization at Google I would say which is best-in-class in the world is 95 plus percent. I would say node utilization in the independent ecosystem especially if it's single tenant clusters is more like 50%. 50. Yeah. 60. Yeah.
**Mike Abbott:** 所以那是巨大的浪费。而我们有——你知道的——那就是我们所说的 Grid。Grid 负责在 AMP 生态系统中做所有的负载均衡和调度分配。我们认为 94% 就是重大故障——因为那就是我们试图在我们的投资组合公司、团队等等中运行的规模。
**Mike Abbott:** So that's a huge amount of wastage. And so um we have a — you know — that's what we call the grid. The grid does all that load balancing and scheduling allocation across the AMP ecosystem. And we consider 94% a major outage because um that's the scale at which we're trying to run across you know our portfolio companies teams etc.
**Anjney Midha:** 在 Google,我觉得 96% 我们就认为是重大节点故障,因为那个规模。嗯。
**Anjney Midha:** At Google I think 96% we considered a major node outage because of the scale. Mhm.
**Mike Abbott:** 所以我们有一个动态的分配系统,叫做 Grid,它把不同云、不同供应商和提供商的容量汇集起来,以动态的方式重新分配。
**Mike Abbott:** And so that's that's how we we we have a dynamic uh sort of allocation system called the grid which pools capacity across different clouds and vendors and providers and reallocates them in a in a dynamic way.
**Anjney Midha:** 我觉得就到这里了。
**Anjney Midha:** So I think that's it.
**Mike Abbott:** 好的。
**Mike Abbott:** All right.
**Anjney Midha:** Office hours 第一期完成。
**Anjney Midha:** Office hours number one done.
**Mike Abbott:** 各位下周见。
**Mike Abbott:** See you guys next week.
**Anjney Midha:** 各位下周见。周末愉快。
**Anjney Midha:** See you guys next week. Have a great weekend.
**Mike Abbott:** 再见。
**Mike Abbott:** Cheers.