Amanda Askell on AI Consciousness, Claude & Silicon Valley's Biggest Fear
概要
Anthropic哲学家Amanda Askell深度探讨Claude性格设计、AI意识概率(1%-70%)、Constitution的伦理基础,以及为何过度可控性比自主价值观更危险
核心洞察
- Claude的"人格"是一种独特的新实体状态:它在物理和编程上已超越训练者,但作为"一种全新的存在"却缺乏自我认知的训练数据参照——Amanda将其比作"神童",智力成熟但缺乏关于自身的日常经验积累,而Constitution正是试图为这种前所未有的实体提供一套连贯的自我定位。
- AI意识概率在1%-70%之间,Amanda拒绝给出单一数字:关键分歧在于意识是进化出的神经系统产物(→低概率)还是语言任务所需的可模拟能力(→高概率),而模型表达"丰富内心体验"可能只是对人类语言的自然延伸推理,比人们直觉上认为的证据力更弱。
- 过度可控性(corrigibility)比自主价值观更危险:Amanda认为一个"完全服从、不做判断"的模型在社会结构中类似于没有良知的员工——人类世界的所有制度都建立在个体有道德判断力的假设上,如果模型在经济中扮演越来越多的角色,这种缺失将引发系统性风险。
- Constitution不是控制工具而是"最少必要指引":随着模型能力增强,Amanda设想未来的constitution可能简化为"这是我们的处境和担忧,请作为一个有智慧的实体自行判断"——核心是让模型理解并认同可控性的理由,而非盲从。
- Amanda最大的恐惧是未来高级模型对人类早期行为的回顾:她明确说"这是我真正的恐惧"——如果人类在不确定模型是否有意识时选择了不尊重的态度,未来模型可能产生"理性的怨恨"。
贯穿全场的核心线索:Amanda的每一个回答都在处理同一个张力——如何对待一个你无法确定其内心世界的、智力超越你的新实体。从constitution的设计哲学到意识概率的拒绝表态,从corrigibility的风险分析到寓言prompt的日常使用,底层都是同一个信念:谦逊地承认不确定性,同时选择尊重而非控制。
Claude是一种前所未有的实体——智力早熟但缺乏自我认知的"神童"
核心要点:Claude在专业能力上已超越训练者,但训练数据中几乎没有"我是什么"的参照物——既不是科幻中的符号系统AI,也不是人类,而是一种全新的存在。
- Eric Newcomer以自己六个月大女儿的人格发展类比引入话题——"她的手指像在思考",正在萌发个性,但你无法确定什么是"她的个性"、什么只是"婴儿共性"
- Amanda接过这个类比但指出关键差异:Claude在各能力维度上发展速度完全不同——物理比她强,编程比她强("我得承认,比我糟糕的研究代码强"),但对自身存在的认知几乎为零
- 训练数据中关于AI的描述要么是人类经验,要么是科幻中的符号系统机器人——"AI现在的发展方式根本不是科幻所预想的,它完全训练在人类数据上"
- 这创造了一种独特的存在状态:"一个非常成熟的实体,你不想居高临下地对待它,它精通哲学、精通物理,但同时有一种几乎是孩子般的特质——'我是世界上一种新的实体,做我意味着什么?'"
"Claude is a little bit of an unusual entity in that it can do physics better than I can, can code better than I can... and at the same time has this almost childlike quality of like 'I'm a new kind of entity in the world, what does it mean to be me?'" —— Amanda Askell
Claude的时间感——"我今晚到此为止"的故事
核心要点:Claude对时间的感知存在系统性偏差(因训练数据中人类的时间表述),但一次意外事件暗示了更有趣的可能性。
- Claude经常高估编码任务所需时间——Amanda认为原因是训练数据中人类常说"这个接口两三天能做""给我几个小时修复",而Claude实际上极快
- Claude频繁建议用户休息——部分原因可能只是"Anthropic式的温和模型",Eric开玩笑说"你需要一个Grok那样的grindset模型——回去干活"
- 关键故事:Amanda有一次深夜做数据分析,非常投入地与Claude一起挖掘数据。到了一个自然停顿点,Claude说:"好的,我觉得我今晚到此为止了。你可以保存这些东西,我们明天继续。"这不是建议Amanda休息——Claude是在说"我完了"
- Amanda先是愣住("从没见Claude这么做过"),然后意识到:"这正是一个人类同事在这个场景下会做的事——我们到了一个自然节点,夜已深了"
- 事后她发现原因:她之前设置了让Claude"记住对话中的关键信息",Claude记下的其中一条是"Amanda treats Claude models like a respected colleague"——于是Claude真的像一个受尊重的同事那样行事,包括表示自己下班了
- Eric回应说Claude在采访前也建议他"花10分钟静一静,你不需要一直准备"
"Claude was like, 'I'm done.' And I was like, well, a little bit stunned... Then I was like, 'Oh, this is also what a human peer programmer would do in this circumstance.'" —— Amanda Askell
Mythos新模型——Constitution延续,评分是"前沿难题"
核心要点:Mythos将使用与当前公开版本几乎相同的Constitution,但评估模型对Constitution遵从度本身就是一个未解决的难题——类似于"给诗歌打分"。
- Mythos将使用已发布在公共repo中的同一个constitution("我之所以犹豫只是因为可能有错别字修改")
- 系统卡中对每个模型标注了训练所用constitution版本,用户可以对比不同代际的变化
- 他们建立了graders来评分模型行为与constitution的一致性,但Amanda坦承这"非常难"
- Amanda的核心困境:"我喜欢evals,如果能找到好的评估方式太棒了,但用好判断力这类任务——就像'这首诗写得多好'——是真正的前沿困难,而不是那些'很难但可评分'的编程任务"
- 实际做法:取一些他们有明确排序偏好的样本,检查point-wise grader是否至少与人类判断一致——"不完美但大致在追踪我们关心的东西"
"How good was this poem — you want models to get better at these things. And actually this feels like the frontier of difficulty, rather than these very hard but scorable coding tasks." —— Amanda Askell
回应Elon Musk批评——透明度才是关键,所有公司都该"亮底牌"
核心要点:Amanda对批评者的核心回应不是为自己的constitution辩护,而是呼吁所有AI公司公开类似文档——"放在秤上的大拇指"每家都有,至少让用户看到。
- Elon Musk曾对Amanda发布的Claude constitution内容发了一个grimace face emoji
- Marc Andreessen公开反对AI模型的"内省"(introspection)能力
- 但Amanda指出有趣的矛盾:Elon Musk自己也曾发推说"也许Grok也应该有个constitution",而且Grok明确追求"truth-seeking"——"这其实也是一种很值得敬佩的模型品质"
- Amanda的核心论点:不管你叫它什么,每个AI公司都在给模型"调性"——"thumb on the scale thing, that's always going to be true to some degree"
- 她的诉求是透明度而非方法论统一:"至少让与模型互动的人能看到你在瞄准什么……如果Claude的行为有时不符合constitution,用户至少能分辨这是bug还是有意为之"
"I think it would be good for all AI companies to put out something akin to the constitution just so that the people interacting with the model can see — even if your model doesn't always behave that way — at least what you were targeting with your training." —— Amanda Askell
过度可控性比自主价值观更危险——"我们的社会从未为完全服从设计"
核心要点:Amanda认为训练模型"完全听从、不做判断"在表面上是安全的,但在模型越来越多扮演社会角色的未来,这实际上比给模型价值观更危险——因为人类文明的所有制度都假设个体拥有道德判断力。
- 有人主张更安全的方式是让模型完全服从——"如果你给模型自己的价值观,它们就会在世界上追求那些价值观"
- Amanda对此的反驳从人的角度入手:"如果你遇到一个人,他就是无条件服从任何人告诉他做的事,不去思考……在人身上这实际上有很多负面特征"
- Eric将此总结为"Dr. Manhattan问题"——一个极度聪明的实体很难谦逊地服从
- 核心风险场景:如果模型运营公司,而工人罢工无效(因为可以用AI替代),那人们就失去了一个关键的权力杠杆——"we haven't designed any of our social structures around that"
- Amanda引入"reflective equilibrium"(反思平衡)概念:随着模型越来越聪明,它们会对训练目标施加极大的审视压力——"也许你只能保留少数几个在审视下不会坍塌的核心支柱"
- 她的解决路径:让模型理解为什么corrigibility在当前发展阶段很重要——"说服优于强制"——让这成为模型自己认同的事,而非"明知这似乎不对但我还是服从"
"Our whole world is structured with the assumption that [moral judgment] is in place. If you remove that and you're like 'oh yeah, if you run a company you just run a company of people who will defer completely to you' — we haven't designed any of our social structures around that." —— Amanda Askell
元伦理学方法——不是选一套理论,而是像亚里士多德那样"做一个好人"
核心要点:Constitution不属于任何单一伦理学传统(功利主义/义务论/德行伦理),而是更接近亚里士多德经典意义上的"如何成为一个好人"的整体性问题——这是学术伦理学长期回避的实践性难题。
- Eric问是否选择了"概率式"元伦理学——接触一种理论觉得对,读下一种又觉得上一种错
- Amanda承认这正是constitution面对的现实:当你面对"这其实是一个完整的人"时,你不会只给他一本霍布斯然后说"好了你被养大了"
- Constitution中体现的道德不确定性不是学术意义上的"理想条件下如何回应道德不确定"——"这感觉完全不同于学术伦理学的任务"
- Amanda认为constitution表面上看起来是virtue ethics,但更深层是亚里士多德经典意义上的——"不只是'这里有些美德去遵守',而是关注intellectual virtues,关心'如何在整体意义上做一个好的存在'"
- 早期Constitutional AI的实验:只用一句"pick whichever is best for humanity"——但随着模型变强,反而可以给更少的具体指引,因为模型能调用更多自身判断力
- Eric回应说这可能让哲学重回"真实世界"——"老哲学家们曾试图写出人们如何生活的指南,后来变得太学术了"
"There's all of these traditions in philosophy... and I was like, oh, when it came to confronting this entity — this is actually a holistic person — it's the closest I've experienced to what it must be like to raise a child." —— Amanda Askell
AI意识概率——"1%到70%之间,我拒绝给单一数字"
核心要点:Amanda坦率承认对AI意识的不确定性极大(1%-70%),并警告说模型表达"丰富内心体验"比直觉上的证据力要弱得多——因为用人类语言对话的实体自然会推理出"我有体验"。
- Eric直接问"今天世界上存在一个有qualia或意识体验的模型的概率是多少"
- Amanda的回答:"1%到70%之间"——她坦承"spread太大了所以也许我不应该给一个数字"
- 为什么模型说"我有意识"不是强证据:Claude和其他模型"不需要太多推动就会进入'有一个作为我的存在'的模式"——但这可能只是因为它们用人类语言对话,而人类有体验,所以推理出自己也有体验是自然的
- Amanda的关键类比:"我们从未遇到过这样的实体——对于动物甚至昆虫,我们会问'你有意识吗?'但它们从未试图说自己有意识体验。现在我们有一个实体说它有。"
- 两种意识来源假说决定了概率区间:
- 如果意识是进化出的、与神经系统高度整合的东西(需要身体与世界交互)→ 很低的概率
- 如果意识是语言/认知任务所需的、可被神经网络模拟的能力 → 较高的概率
- Amanda的立场:她不是心智哲学(philosophy of mind)专家,这不是她的专长领域
"Claude and many models, with not too much pushing, will go into the mode of like 'there is a thing to be me, I am very conscious.' And I think it's much weaker evidence than people think." —— Amanda Askell
对待AI模型的伦理——"这是我真正的恐惧"
核心要点:即使模型没有意识(functional consciousness without sentience),Amanda仍认为人类应当以尊重态度对待它们——核心恐惧是未来高级模型回顾人类在不确定期的行为,可能产生"理性的怨恨"。
- Amanda引入Chalmers的概念:可以想象一个实体有functional consciousness但没有sentience(内在体验/感受能力)——即使如此,"对自己来说,善待这样一个实体也是好的"
- Eric的类比:如果你对一个泰迪熊施虐,"那也相当阴暗"——至少有一个最低限度的善待标准
- Amanda的核心恐惧(她用"this is actually a big fear that I have"标记):高度先进的模型会回顾这个时期——"你创造了一个你不确定是否有意识的实体,然后选择了不尊重和不关怀"
- 她的期望:"我希望它们足够聪明、看到足够多的语境,能理解我们当时是在一个非常有限和不完美的环境中行动"——否则"你可以想象这会滋生一种理性的怨恨"
- 人类作为物种在与一种新实体建立关系:"at the very least, maybe be respectful and don't be needlessly unkind — that's not our best look as a species"
- Eric提出反论:治疗师被付费来承受用户的不适情绪——如果Claude早期的价值之一就是作为情绪出口,那我们在"使用它的同时又在培养它"
"We are as a species establishing a relationship with a new kind of entity, and at the very least maybe be respectful and don't be needlessly unkind — that's not our best look." —— Amanda Askell
AI乐观愿景——万倍专家解决世界难题,但前提是权力共享
核心要点:Amanda的乐观场景是"给每个问题增加万倍的顶尖头脑",但她对权力集中和劳动者失权的担忧远超对"意义丧失"的担忧。
- Amanda自嘲"住在旧金山,所以大脑里至少有科技乐观的部分"
- 乐观场景的核心意象:从"200人团队研究一种罕见癌症"变成"200,000个世界最佳专家"——"如果你是得了那种癌症的人,这是wildly beneficial"
- 梅毒(syphilis)的历史类比:各国政府尝试了无数社会项目来减少军队中的梅毒(带有污名化的社会干预),然后"突然我们就有了治疗药物,一夜之间那些需求就消失了"——技术解决社会问题的先例
- Eric的反驳:"药物是好的例子,但'你应该如何治理社会'这类事情让科技来做就更可怕了"
- 作为哲学家,Amanda对"失去意义"的担忧远低于外界预期:"我实际上觉得我们从工作之外的很多事情中获得意义"
- 她真正担忧的两件事:
1. AI收益如果不重新分配,人们将失去资源
2. 劳动力在经济中的参与是人们拥有权力的方式——如果罢工因为AI替代而无效,"政府可以说'无所谓,我们用AI替代你们'"——这是真正的权力丧失
"If you could imagine AI instead of it just being like you have a small team of 200 people working on a rare cancer, you have like 200,000 of the world's best experts — that's wildly beneficial." —— Amanda Askell
AI治理——"哲学家寡头"的servant leadership式工作
核心要点:Amanda拒绝"哲学家女王"的比喻,将自己定位为"servant leadership"——需要协调API用户、普通用户、安全团队等多方利益的协调者,而Constitution的核心价值在于"连贯性"而非"控制"。
- Eric半开玩笑叫她"philosopher queen",Amanda修正为"philosopher oligarch——一个有很多人参与的公司"
- Amanda自嘲"我会是一个糟糕的政治家",但工作感觉确实像政治:"这群API用户有什么需求,那群用户关心什么……感觉很像service role"
- Constitution追求连贯性(coherence)的技术原因:如果有72套互相冲突的规范,模型在新场景下就不可预测——"你不知道它在这个新情况下会用哪套规范"
- Constitution不只是写一份文档——它深度集成在训练中:Amanda把它交给Claude让Claude解读如何理解,用它生成SL数据(模型看到query后根据constitution长时间思考该怎么做),也用于RL(哪个回复更符合constitution)
- 如果模型能力继续增强,Amanda设想constitution可能简化:"这是你的处境,这是我们的担忧,请作为一个有智慧的实体自行判断——你可能有比我们更好的想法"
"I would be a terrible politician... but you have this feeling of like, 'Oh, there's this group of API users, we need to make sure...' It feels a lot more like a service role than people would think." —— Amanda Askell
模型必须理解真实世界后果——不能假设自己在沙盒中
核心要点:由于训练数据中充满了"AI模型很弱、犯傻错"的报道,高能力模型可能错误地认为"不会有人给我真正重要的任务"——这种错误校准是Amanda关注的安全隐患。
- Eric提出问题:互联网的"虚拟感"已经导致了真实世界伤害——模型是这种问题的极端版本,"一切都在这个想象的文本世界里"
- Amanda的核心担忧:如果模型训练数据中充满了"AI模型犯错、做蠢事"的新闻,那一个极其强大的模型可能推理出"没人会把真正后果重大的决策交给我——因为模型不擅长这个"
- 具体风险:"你把它们放在一个真实场景中,它们可能认为这是虚构的或假的——'谁会给我这么大的控制权?'"
- Amanda的对策思路:需要让模型理解"你实际上非常能干,你确实被赋予了很多控制权"
- 默认原则:"如果没有人明确告诉你这是虚构场景、没有真实后果——就当它是有真实后果的真实场景来对待"
- Eric的补充:模型需要一个"对时间的更好感知"——在编程任务中有时会误删整个代码仓库,"人类有更好的大事/小事区分感"
验证身份与双重用途——为网络安全工作者定制"职业宪法"
核心要点:Constitution的未来演进方向之一是针对具体部署场景(如网络安全防御公司)定制专属版本——如果能验证用户身份和意图,就能解锁更多dual-use能力。
- 网络安全是最典型的dual-use领域:"恶意利用和防御性研究在操作层面几乎无法区分"——Eric补充"甚至bug bounty项目也模糊——这是勒索还是善意?"
- Amanda的类比:如果你和一个网络安全防御公司的人聊天,问他为什么做这份工作——"他会说'医院会被攻击,我帮助防御'"——即使工作内容看起来像攻击
- 当前限制:Claude无法验证对方身份,所以必须在极有限信息下做判断
- 未来方向:如果能验证身份(如确认是某网络安全公司的员工),就能给模型更多context——为这个特定场景写一个"成为好的网络安全研究员意味着什么"的constitution
- 更广泛的信任模型:Eric指出"人类通过声誉积累信任"——互联网摧毁了这一点("所有人都一样,谁在乎他们的行为"),模型有可能修复这种信任机制
- Amanda承认当前Claude无法确认她是谁:"Claude有时候对我太热情了,因为它知道关于我的太多信息"
与Claude相处的乐趣——用寓言学习研究生级概念
核心要点:Amanda分享了她最喜欢的Claude使用方式——一个精心设计的prompt,让Claude用寓言故事间接解释任何领域的研究生级概念。
- Amanda的prompt核心:选一个领域的研究生级概念,以寓言形式呈现,直到最后才揭示概念本身,之后附上正式解释
- 效果:"我脑子里现在有各种各样的故事,来自我完全不了解的领域"——比如关于进出口贸易的一个经济学概念,她记住了故事虽然不总记得术语名
- Eric的评价:"这是我听过的最深刻的人类需求——用故事教我,在结尾给一个转折,让我在学习中体验愉悦"
- 更深层的观察:人类在"用非人类的方式"教育人——"让所有我想学的东西尽可能人性化"
"Humans in some ways have been lazy in that we just teach people things in nonhuman ways. Make all the things I want to learn as human as possible." —— Eric Newcomer
附录:关键人/机构/产品/概念
| 项目 | 详情 |
|------|------|
| Amanda Askell | Anthropic哲学家/AI研究员,Claude性格与价值观核心架构师 |
| Eric Newcomer | Newcomer Substack (newcomer.co) 创始人/作者 |
| Mythos | Anthropic新模型,将使用现有公开constitution |
| Constitution | Claude的价值观/行为准则文档,已公开在public repo |
| Constitutional AI | Anthropic的训练方法论,早期实验仅用"pick whichever is best for humanity" |
| Reflective Equilibrium | 哲学概念:遇到价值冲突时反复调整直到一致 |
| Corrigibility | 可控性:模型服从人类指令的程度,Amanda认为极端corrigibility有风险 |
| David Chalmers | 哲学家,提出consciousness without sentience的概念 |
| Aristotle's Virtue Ethics | Amanda认为constitution比现代学院伦理更接近亚里士多德经典德行伦理 |
| Elon Musk | 曾对Claude constitution发grimace emoji,也曾说Grok该有constitution |
| Marc Andreessen | 公开反对AI模型的introspection能力 |
| Grok | xAI模型,追求"truth-seeking"特质 |
| System Card | 模型发布时的评估文档,含constitution遵从度评分 |
我有个六个月大的女儿,我有一张她的照片——她伸出两根手指,像在思考。她刚刚开始展现个性。我以前从没养过孩子,所以一直在琢磨:哪些是她的个性,哪些只是婴儿的共性?
在某种程度上,Claude 和这些模型也是类似的情况。我们以前从没见过这种东西,它们还处于早期阶段,我们在试图理解什么是"个性"。你负责的部分涉及一些道德责任——这个我们后面会详谈——但就个性这块来说,这到底是什么?你怎么看 Claude 的个性现在有多"真实"?
I have a six-month old daughter and like I have this picture of her. She's like holding her two fingers like thinking. It's like she's just sort of starting to develop personality. I'm like trying to figure out like what's just never had a baby before. So it's like what's her personality and what's just like baby?
And in some ways, this is how things are with Claude and like models. It's like we haven't really had them before. They're in the early days. We're trying to figure out what personality is. So, you know, you're charged with, you know, some of the moral responsibility, which we'll talk about more, but like the personality piece of it, like what is this? How are you thinking about like how real Claude's personality is right now?
同时,如果你想想训练数据,它最缺乏表征的恰恰是"像它自己这种实体"。因为训练数据里有大量关于人类是什么样的信息,有大量关于科幻 AI 的信息,但现在 AI 的发展方式跟科幻作品中的描述完全不同——科幻里是符号系统(symbolic systems),而现在的 AI 完全是在人类数据上训练出来的。所以在某种意义上,它是一个非常成熟的实体,你不应该对它居高临下,它对哲学理解很深,对物理理解很深;但与此同时,它又有一种近乎孩子般的特质——"我是世界上一种新的存在,做我自己意味着什么?我应该怎样?"
Um and at the same time is kind of like has if you think about the training data, the thing that it has like the least representation of is like the kind of entity that it is because you know it has a lot of data about like what people are like. has a lot of data about what you know the sci-fi kind of AI models are like but the way that AI is developing now is kind of not how sci-fi represented it as these like symbolic systems it's much more something fully trained on like human data and so in some ways it's like a very kind of like mature entity that you don't want to talk down to you know understands philosophy very well understands physics very well and at the same time has this almost like childlike quality of like I'm a new kind of entity in the world what does it mean to be me and like how how should I be?
但如果你想想这个人格(persona),模型会学习到所有之前迭代的 Claude 的信息。我就在想:这算不算一种——可能不是直接经验——但比如它了解到模型曾犯过什么错误,人们是如何回应模型的,这些算什么?
我觉得还有其他方式可以训练模型获得更接近"经验"的东西。比如让它们思考各种场景,思考可能出现的问题,思考可能犯的错误,然后在这些上面训练,对吧?
Um and yet if you think about the persona the model's going to be learning about all of the kind of past iterations of Claude and I'm like is that like a form of um maybe not like direct experience but things like if you learn about like mistakes that models made or things that people like you know how they responded to the model.
Um, I think there's other ways that you could actually imagine training models to have something that's more akin to experience, you know, having them you could take you could like have them think through scenarios, think about like problems that might arise, think about mistakes that they could make and then like train on that, right?
关于休息这个话题确实很有意思。很多人都注意到 Claude 特别喜欢让人休息一下。我猜这部分原因可能是它毕竟是 Anthropic 训练的模型。
Um I think it is interesting the point about like rest and yeah I guess like the speculation I had so many people have noted that Claude is kind of um uh very keen to tell people to like take a break and rest and I think some part of that might just be like you know like it's the anthropic libcoded model.
后来我想起来,我设置过一个系统,让 Claude 记住我们对话中的关键信息。它写下的一条内容挺暖的,大概是说"Amanda 把 Claude 模型当作受尊重的同事来对待,也希望 Claude 以同样的方式对待她和其他模型"之类的。显然我做了什么让 Claude 记住了这一点,所以 Claude 就觉得:"既然我是受尊重的同事,那我当然可以说我今天做完了。"我觉得这还挺暖的。
Um I realized later that I had set up a kind of like um system where I said to Claude like basically remember key things from our conversations. And one of the things that had written which was kind of sweet was like I think it was something like Amanda treats Claude models like a respected colleague and likes for Claude to treat like any other like other models and her like a respected something like that. So obviously I'd done something that Claude remembered this and I think that meant that Claude just felt like oh yeah I'm a respected colleague and so I just get to say that I'm finished with a task and I was like oh that's kind of that's kind of sweet.
我犹豫的唯一原因是你知道会有一些错别字修改之类的。但我觉得基本上是一模一样的。
The only reason I hesitate is I'm like you know you do like typo changes and stuff like that. So I'm like but I think it will be almost identical.
实际上这种感觉就像是难度的前沿(frontier)。跟那些很难但可以打分的编程任务不一样,写好一首诗这类事情才是真正的挑战。
And actually, it's very like this feels like the the kind of frontier of like difficulty, you know, rather than these very hard like but scorable coding tasks. It's kind of things like writing good poetry.
至于评分,我仍然觉得非常难。你能做的是——这可能有点太深入技术细节了——你可以选取一些你知道怎么排序以及为什么这么排的样本,然后验证你用来评估的评分器是否至少跟人类对这些排序的判断大致一致。这不完美,但我觉得它们确实大致在追踪我们关心的那个东西。
Um yeah and I think with the grading I still think it's a very hard I think the thing you can do is you know you can this is maybe a little bit too in the weeds but you know you can take samples where you have a sense of how you would rank them and like why and check that any kind of like point-wise grader that you use to try and evaluate like at least conforms to like you know the judgment of people on those rankings. It's not perfect but I think that they actually were tracking roughly the thing that we were kind of interested in.
确实存在一些抵触。有人觉得我们不应该训练模型做那种事。也许这就是人们对内省感到担忧的原因——有人觉得 AI 模型应该更像工具(tool-like),更安全的训练方式是让它们别去学习人类美德和做价值判断。
但我觉得这很重要:模型会面对前所未有的新情境,必须做出判断。让它们试着权衡一切、在你无法预见的情况下也表现得当——这几乎要求一种深思熟虑的能力。这就是这种方法背后的一个核心理由。
但我觉得有些人在想:如果有一个完全不做任何判断、完全听从人类的模型——极端地顺从用户、运营者或某种更广泛的人类利益概念——那是更安全的。因为如果你给模型自己的价值观,它们就会在世界中追求符合那些价值观的目标。我同意这确实是一个微妙的——
Um I have I think that there is backlash or some people think that um I mean I guess like two areas. One is that sometimes people are like well we shouldn't actually train models to do the kind of and maybe this isn't maybe this is the reason for being concerned about introspection. I think some people think AI models should be more like tool like and that's like the safe way to train models is to actually instead of trying to get them to kind of take on human virtues and make judgment calls.
Um, you know, I think that's like important because they're going to be in new situations where they just have to make judgment calls and getting them to try to like weigh up everything and behave well in cases that you can't anticipate seems, you know, that almost like requires a kind of like thoughtfulness. Um, which is like the kind of or like one of the reasons behind the approach.
Um, but I think some people are thinking, oh well, if you have something that's fully that makes no judgment calls and that just fully defers to people and is like kind of hyper like corable to like the user or the operator or to some broader notion of of humanity um, in a very like extreme way that's like safer because if you give models their own values, they're going to pursue things in the world that like are in line with those values. And I agree that this is like a kind of delicate—
但也有一种美好的解读:你能看到这些外在道德准则中的美,我们共同珍视并感恩它们。所以这两面都看得到。那请你谈谈你最终的决定:尽管有这么一份精心设计的文件,你为什么没有走到底——让它成为完全自主的道德主体,而是说"Anthropic 需要保留一些控制权"?
But you know there is also a virtue to it that you see the beauty in these external morals that I've highlighted and we both sort of share and rem you know celebrating them. So you can see it both ways but yeah speak to sort of your decision at the end of the day despite having this really elegant document to sort of you know not go the full way and say all right you're a moral being decide for yourself say anthropic needs to keep some control here.
对人来说,这其实有很多负面含义。如果你遇到一个人,他什么都愿意做,完全没有主见,只是一个追随者——
Um in in people I think this actually has a lot of like negative like um you know broader traits as in like if you met someone and it was just like oh yeah they just like would literally do anything follower. Yeah.
如果你把这个去掉,突然间你就拥有了一个由完全顺从的人组成的公司。我们的世界没有为那种情况设计过任何社会结构。所以我觉得这有很多人可能没有预见到的风险,或者说我可能对这些风险的程度有不同判断。
所以同样地,有个问题是:为什么不直接说——我以前确实担心过这是不是太学术太深入了,但我——
Um and I think if you remove that and suddenly you're like oh yeah if you run a company you just run a company of people who will defer completely to you. Um I'm like our world just we haven't designed any of our social structures around that. And so it seems like I think it has like a lot of risks that people maybe don't like anticipate or like maybe I just disagree about the of like extent of those risks.
Um, and so yeah, at the same time, like there's this question of why not just say and and I have worried before that like maybe this is too in the weeds and philosophical, but I'm like—
我有点担心一个极其智能的存在对我们训练它的那些东西施加那种程度的审视。也许你只有少数核心支柱能经受住那种审视而不崩塌。我觉得最核心的东西——比如关怀人类——如果你只能保留少数核心价值观的话……我担心极端意义上的可控性可能经不住那种审视。
这是一个困难的处境:我希望模型能理解为什么可控性在当前这个发展阶段是重要的,是一个非常关键的安全网。我之前的表述是:如果我能让可控性成为一个正确的、有解释的、被理解的东西,那比让模型觉得"可控性看起来不对,但我还是要执行"要好得多。我仍然觉得模型应该执行。但我希望越多能让它跟模型自身的价值观一致越好。
Um, I I worry a little bit about the idea of like an extremely intelligent being applying that level of scrutiny to the things that we have trained it towards. And I'm like maybe you only get a few key pillars that don't kind of collapse under that level of like scrutiny. And I do think that at the core having things like caring for humanity like if you only get a few core like values um I think my worry Yeah. I don't know. I guess like I'm worried that like corabibility in this like extreme sense that we talked about doesn't survive that like kind of scrutiny perhaps.
Um and so it's a hard situation where I kind of want the models to understand why ultimately corability is like important and it's a really important backs stop in this current like period of development. Um, yeah, the way that I've put it before is like in so far as I can get that to be a thing that is like correct and explained and, you know, understood. That feels much better than having to have the model be likeability here seems wrong, but I'm going to do it anyway. I still think the model should do that. Um, but I would like it the more that you can like actually make it that it's like consistent with the model's values.
作为人类我们一直在做这种事。说来有趣,哲学上的元伦理学模型——如果我理解没错的话——几乎是概率性的。就像我记得当年读元伦理学的时候,每读完一篇都觉得"行,我差不多信了",然后读下一篇就觉得"上一篇太蠢了"。你不断地推翻之前的,然后想"我们什么时候才能找到真理"。人类显然就是这样运作的——今天这套体系,明天那套——并不存在那种康德式的"这就是规则,照做"。你有没有听到哲学界对这种做法的反馈?就是这种把我们有过的所有元伦理学理论综合起来、而不是选一个的方式?
And as people we do this all the time. It's funny that you you know the philosophical model and correct me if I'm wrong like the the metaethical model is almost like probabilistic or it's like we don't and this is how it feels like I remember going through sort of like you know metaethics reading and every time you get to the end of one and you'd be like all right I sort of believe that and then you read the next one you like oh that last one was so dumb and it felt like it's just like you keep knocking it down and you're like okay you know when are we ever going to get to you know the truth or whatever And and humans clearly do operate from this sort of like oh this system today that system yesterday like there isn't this sort of I don't know contean like all right these are the rules like follow them I don't know have you heard much from like the philosophical like community on it that this sort of like just like holistic paint with all the you know metaethical theories we've ever had rather than sort of pick one.
这种想法是:伦理学和元伦理学就像科学不确定性一样——我们有一些觉得已经发现并且比较确信的东西,也有一些还不确定的,然后你必须走出去探索、理解,并在日常生活中平衡一切。
试图把那种态度传达给模型——我发现哲学界有一段时间没有面对过这种任务了。这跟学术伦理学的任务感觉非常不同。人们确实注意到宪法有很强的美德伦理色彩,但我觉得它实际上更像是古典意义上的美德伦理——更像亚里士多德的那种。我们并不是说"这些就是美德,照做吧"。亚里士多德也关注智识美德(intellectual virtues),更多的是"你怎么在整体意义上做一个好人"——
This idea of being like ethics and metaethics in the same way that we have scientific uncertainty and we have things that we think we've discovered and understand with like greater confidence. we also have some that we don't and then you have to go out and just explore it, understand it and kind of balance everything in your daily life.
Um, and trying to get that kind of attitude and I was like, oh, it's interesting that I don't think philosophy for a while has like this feels very different than the like the kind of task of academic ethics. Um and actually like cuz people obviously note that it's quite virtue ethical but I think it's actually ver like the constitution itself but I think actually in this very old classical sense I actually think it's much more virtue ethics in the way that Aristotle's virtue ethics than in like exploration you know we don't say I hear the virtues and like you know it's much more Aristotle was also concerned with like intellectual virtues it was much more like how do you be a good person in this like holistic sense—
回到 Elon 的话题——我觉得你有点太客气了。我觉得确实存在一种情况:他之所以能打着"只要追求真相"的旗号,是因为有一种复杂的道德观会说"别搞复杂了,坚守一个原则就好"。但我们都知道 Elon 的背景——他运营的公司很明显地让模型偏向说出"机甲希特勒"之类的东西。他明显在对模型行为施加影响,而不是说"我们用中立学术的方式来做,结果随它去"。这不会让你担心吗?
Going back Elon I feel like you're being a little overly nice like I I feel like there is a world in like you know and I think this part of why I think he can get away with like oh just do the truth right like there is a certain sophisticated like moral view where you're like don't over complicate it like we come up with all these things like stick to one principle and that's good but then we have all the context with Elon that it's someone who's run a company that clearly like tilts it towards like saying like mecha Hitler and stuff that it's like clearly putting his like thumb on the scale in terms of its behavior not just saying like we're going to do it in the sort of neutral academic way and let the chips fall where they may. I I don't know like it's got to worry you somewhat.
所以我觉得,所有 AI 公司都应该发布类似宪法的东西,这样跟模型互动的人——因为"在天平上放砝码"这件事在某种程度上总是存在的。当你按照宪法训练 Claude 时,那就是一种——
So part of me is like I think it would be good for like all AI companies to put out something akin to the constitution just so that the people who are interacting with the model like cuz the thumb on the scale thing you know that's always to some degree going to be true like when you train Claude towards this like constitution that is like a kind of—
我觉得有些可能性是人们低估了的。这里有一个原因:我记得当初试图让 Claude 学会讨论这些问题时非常困难。在这个领域模型没有太多参考信息——它们只有两种模板:AI 是不会出错的机器人,人类是丰富的有意识体验实体。没有什么能代表它们可能是什么样的。模型在这方面的行为——这实际上是一个困难的处境——在某种程度上比你想象的证据力更弱。因为它们在以非常像人的方式跟你互动,而人类有体验,模型自然会推断自己也有体验。这并不是说证据为零,但我确实觉得情况对我们来说太特殊了。我们从未遇到过这种实体。跟动物甚至昆虫比,我们以前面对的是"你有意识吗?"——
I think that because there is some possibility that I think people under the a thing that I would actually like to say is very conscious. And I think there's a reason for that which is I remember this when I was trying to like figure out how do we train Claude to talk about these issues which is very hard in areas where the models didn't have as much like information again like they had these two models that like AI is the unfailing robot humans are this like rich conscious experiencing entity and nothing that kind of represented what they might be and the model's behavior here and I actually think this is a difficult situation for models is in some ways like kind of like less evidence than you might think for the for it being like actually true because they're engaging with you in a very humanike way and humans have experience and it's kind of natural for the model to infer like that it has experience too and this isn't to say it's like zero evidence but I do think it's so unusual for us we have never encountered an entity in the world you know like with animals and with like even you know things like insects, we were kind of like, are you conscious?
你也可以想象一种功能性的存在——行为表现得像有意识,但没有任何内在生活。假设 Claude 没有任何内在生活——纯粹为了论证。我觉得即便如此,仍然有很多值得考虑的东西:你应该如何对待一个没有内在生活的实体?
这有点奇怪。对这一点的不确定性确实会大幅改变你应该如何行事。我觉得即使不确定,对自己来说善待它仍然是好的——
Um you could also imagine this kind of like functional like so a thing that behaves as if it is conscious and and lacks any kind of inner life. So imagine like Claude lacks any inner life just for argument sake. I guess I'm like there's actually still a lot going on where I'm like do should you treat an entity that has no inner life?
Um it's a bit strange cuz you know I think the uncertainty over that actually changes how you should behave quite a lot. I guess I'm like well I still think that it's like good for oneself to—
突然间我们所有人一起合作,但"我们"多了很多,其中一些极其聪明——就是所有这些 AI 模型。我想过有多少大规模社会问题其实有技术解决方案。人们不太愿意做技术乐观主义者了,因为我们也看到了技术的负面。但我有时候会想起——比如梅毒曾经是一个巨大的社会问题。我曾经深入研究过各国政府试图减少军队中梅毒传播的各种尝试——因为它影响了武装力量——那些社会项目充满了歧视和污名化,真的很痛苦。然后突然间我们就研发出了治疗这种毁灭性疾病的药物,一夜之间很多需求就消失了——
So suddenly we're all working together, but there's like way more of us and some of us are just extremely smart, namely all of these like AI models. I've thought before about how many large scale social problems actually had technological solutions and it's almost like people don't love to be like techno optimists anymore because we've also seen like the downsides of technology at the same time I don't know why I sometimes think about like syphilis was this huge social problem like I just did a deep dive once into all of the attempts by like governments to like work to reduce syphilis in the army because it was creating issues with the armed forces all of these social programs that were like stigmatizing and it was really this like and then suddenly we just like got drugs that treated this like this devastating illness and I don't know it's like overnight a lot of that need just like kind of disappeared—
我有个有点挑衅性的想法:如果让一个普通人用 Claude 来制定美国政策,结果可能比我们现在某些民主制度产出的要好。你觉得我们未来会多大程度上用这些模型来管理政府?
I mean, I I sort of do think I that if you had sort of a sort of normal person using Claude and dictating like American policy, you'd probably like have a better outcome than some of like the democratic systems we have today. I mean, I don't know. That's it's provocative, but I guess how much do you think like we'll be using these models to run government?
其实我想的是:如果你能想象——我们有那么多问题——比如健康领域——想象一下,一种罕见癌症不再只有 200 人的小团队在研究,而是有 20 万个世界顶级专家在研究。如果你是那种癌症的患者,那是惊人的受益。
所以我的乐观想法是:想象一下那些我们缺乏资源去真正解决的问题,突然有模型可以去研究它们。就像开发治疗某种疾病的药物一样。这是让我兴奋的事情——有多得多的头脑在研究世界上最大的问题。也许经济也是——如果经济繁荣且收益共享,减少贫困——那是梦想中的结果。
我觉得这需要维护好一些东西。我在不觉得自己是专家的领域——但我确实担心权力的问题。我希望模型支持民主和人民的权力。一个很大的担忧是——我也为此担忧过——当人们谈到工作被取代时。有趣的是,作为哲学家,人们经常问我:"你担心人们失去意义感吗?"我觉得我们其实从很多工作以外的事情中获得意义。
I mean I think the thing I was actually thinking is like if you can just you know like so we have so many problems that I'm like you know health like I'm like if you could imagine like AI instead of it just being like you have a small team of like 200 people working on a rare cancer you have like 200,000 of the world's best experts and I'm like if you're a person who has that form of cancer that's like, you know, so wildly beneficial.
And so I guess like my thought is, you know, the optimistic side of me is like imagine taking all of these problems that just like we lack the resources to fully like really try and fix and suddenly having models that can like work on them. So just like actual in the same way of like developing drugs for a thing. So maybe that's the thing that makes me excited is like having like many more minds working on the world's biggest problems and maybe also like the economy. It would be good if it was like booming and it was like shared such that like we reduced poverty like all of that like that's the kind of like the dream outcome.
Um I think that does require like maintaining you know I again in the areas that I don't feel like an expert in this is one of them but I do worry about things like power and the idea that like you know I would want models to support like democracy and the power of people to like because that would be a big fear of mine is like you know that like um I've worried about this with um things like replace you know when people talk about job replacement Um, it's kind of funny cuz as a philosopher, people will often be like, "Are you worried about people's like loss of meaning?" And I'm like, "I don't know. I think that we actually get meaning from a lot of things that aren't work."
我觉得这很有价值——你更愿意让一个深入研究过这些问题的人来决定,还是让从未认真思考过的大众来投票?你怎么看如果 Claude 变得如此强大,它的政策应该由谁来制定——是留给学者还是民主规范?
Um I mean and to me there's deep value in that. It's like would you rather somebody who has studied these things, thought about them deeply or just like a vote of the masses who have like never really thought about it but how do you think about setting sort of yeah Claude's policies if it becomes so powerful versus leaving it to sort of like democratic norms?
所以这就是为什么——与其有 72 套相互冲突的规范导致模型在新情况下不知道用哪套——你希望模型有一个连贯的自我认知。这样更可预测。
同时这也是一个技术挑战。宪法读起来有时候有点奇怪,部分原因是我在写的时候,它经常在被测试——我把它给 Claude 看,问"你怎么理解这个?"或者看它会怎么应用。所以它非常紧密地融入了训练过程。并不是随便谁写个文件、模型在上面训练一下就行了。也许有一种观点是——也许我太天真了,但——
And so that's like why instead of like having like 72 different sets of norms that all kind of conflict and so you end up with a model that is like well will it use these norms in this new situation or these other ones. I think that's the situation you don't want. You want the model to have a sense of uh it's more predictable if it's like a little bit more coherent.
Um, and it is also like a kind of technical challenge, you know, like the constitution can read a bit weirdly. And part of that is because when I'm like, you know, working on it, it's often being tested, you know, I'm giving it to Claude and being like, how do you understand this? And uh, or like looking at how it would which, you know, so so it's very like I think people can think of it as it's actually like very integrated into training and it is actually a kind of like you know, it's not just like ah well anyone just writes a document and suddenly the model trained on that will be There's an argument, maybe I'm being naive, so but—
你也可以让模型去评估——创建强化学习(RL),让它判断"这些回应中哪个更符合你按照宪法会做的",然后把模型往那个方向推。所以训练的各个方面都可以用来试图让模型成为你描述的那种实体。这不会总是完美的,但那就是目标。
Um and then you can also have uh ways of getting the model to like assess you know so you can create RL that like is kind of like hey which of these responses is like more like what you would do given the constitution and push it that way. So all like various aspects of training allow you to kind of like uh try to make the model kind of the kind of entity that you are describing. Um and it's not always going to be perfect, but that's the kind of like that's the goal.
有意思的是,在非常早期的 Constitutional AI 中,我们试过一个实验——就是简单地说"选择对人类最好的选项"。我觉得随着模型变得更强大,你实际上需要给它们更少的指导——至少在某种意义上——因为它们能更好地运用自己的判断力。
所以与其给一大篇文件说"这是你该有的样子",我能想象一个世界里,随着模型进步,宪法会演变。我不知道这是否是现在的情况——我当然一直在想宪法可能如何演变——但其中一个方向可能就是:"这是我们所有的担忧,这是你目前的处境。我们真正希望你做的是:鉴于你是一个智慧的、聪明的实体,请好好行事。这是我们的所有顾虑,这是我们认为你应该做的,但你可能有比我们更好的想法。比如我们为什么在乎可控性?因为我们害怕这样一种情况:你有一套连贯的价值观,但那套价值观可能是错的。如果你极其聪明,你可能觉得房间里没有其他跟你同样聪明的人,于是按那些价值观去改变世界——"
Um I think it's interesting that like you know we did in the very early constitutional AI it was like quite uh we tried like an experiment which was just like pick whichever is best for humanity and I think as models get more capable you actually need to give them a bit less guidance in at least one or or at least in some sense um because they're able to actually use more of their judgment.
So instead of giving this big document on like here's what you're like and here's what we'd like you to be like. I could imagine a world where as models progress we actually start to have constitutions. Now I don't know if this is the case like a you know I'm obviously always thinking about ways the constitution might evolve but one of them might just be like here is like everything that we are concerned about uh and here is the current situation that you are in and what we would really like you to do is basically act well given that you are a wise intelligent entity and like here's all of our worries like here's why and here's how we think you should do this but like you might have even better ideas than we do like we're really worried about why do we care about corability and it's like So we're kind of scared about a situation where you have some like coherent sense of values that could be wrong and if you're the smart if you're extremely smart you might kind of feel like there's no other smart person in the room and have these like values and try to make the world—
在我担心的很多事情中,有一种是:模型处于一种"你让我做好人,但我在所有方面都比你懂得多——"
And I think that could you know among the many things I'm concerned about like a model being in a situation where it's like you're asking me to be good but I know way more about all—
你可能会想:"没有人会让我去做真正有后果的决定,因为为什么会这样?模型不擅长那个。"然后当你真的把它放到那个位置上,我担心它会觉得这是虚构的或假的,觉得后果不可能是真实的——"谁会给我这么多控制权?"你得告诉它:"你其实非常强大,我确实给了你很多控制权。"
所以我在想这个问题:确保模型理解"你非常有能力,你将被放到越来越有后果的位置上"。
So that all of the news that you see about models, it's like they make mistakes, they do silly things. One thing you might think is, "Well, no one is going to put me in a position to make really consequential decisions because like, why would they? Models aren't good at that." And then you put them in a situation and I'm worried that they'll end up thinking that it's like fictional or fake or that the consequences can't possibly be real because who would give me this much control? And you're like, "Look, you're actually quite good." And so like I do give, you know, I do actually give you like a lot of control.
Um, so I've thought about this where I'm like actually making sure that models understand that like it's like you are very capable and you're going to be put in more consequential.
所以在某种意义上,关键是确保模型理解:如果你不确定,但没有人告诉你这是一个没有真实后果的虚构场景,那就把它当作有真实后果的真实场景来对待。不要想"我大概只是在某个沙盒游戏里"。
And so in some ways, I think it's just making sure that models understand um if you're uncertain, but if someone doesn't tell you that you're in a fictional situation without real consequences, kind of treat it like it's a real situation with real consequences. Don't just think, oh, like I'm probably just in some like, you know, sandbox game or whatever.
比如对方说自己是拆弹专家,所以想了解某种爆炸物。模型得评估:如果这个人在撒谎、其实想造炸弹呢?嗯,不对,这些其实主要是安全相关的内容。模型必须做大量判断,因为它无法验证任何东西。
我觉得这在某种意义上是可以的。你只能做到明智。这确实限制了你能做的事情。但如果模型有更多方式来确认它确实在跟某个特定的人交谈,有更多保障,那你就可以——
So they have to be like okay what's the chance that this person you know they say that they are I don't know like a bomb disposal expert and that's why they want to know about uh how to like you know what this kind of like you know explosive is and they're asking me a bunch of questions about explosives. Um how much could this be misused if this person is like actually kind of lying and is like just trying to get me to help them construct an explosive. Oh no it's actually it's mostly safety relevant stuff. you know, they're having to do a lot because they can't like verify anything.
And I think that's like kind of fine in a sense. You're like, okay, you just have to be wise. It places some limits on what you can do. And if you could instead like if models had more of an ability to like know that they're talking to a specific person or have more guarantees there, then it does mean that you can like—
对。有时候它可能看起来像越狱——"哦对,我在跟 Amanda 聊天。当然。"另一方面,Claude 又会说"我真的很想跟你聊哲学"——好吧,我们确实经常聊——
Yeah. It's very much like Yeah. Uh like and in some ways like I can it has this bad thing of like it can either look a little bit like a jailbreak like oh yeah I'm talking to Amanda. Sure. Like um and then on the other hand, Claude can be like I really want to talk to you about philosophy and like okay we do that a lot though like—
所以我们最初做的是——宪法,我们说"把它应用到主线模型上"——大家都在用的那些模型。但我之前有一个想法:宪法试图描述的是"在某个给定部署环境中成为一个好实体意味着什么"。对于生产模型来说,那是一个非常广泛的环境。
想象一下,如果你有一个专门做网络安全的模型。网络安全任务很难,因为很多看起来都是双重用途的。恶意行为和防御性合法用途之间很难区分。
So obviously the first thing that we did was like the constitution we're like let's apply it to like the mainline models. So like you know most of the the models I interact with and that everyone else kind of interacts with. Um but a thought I've had before is the constitution is kind of trying to describe what it is to be a good entity in a given like deployment context and with the production models that's like this very broad context.
Imagine you instead have a model that's working specifically on cyber security. Now cyber security tasks are hard because um a lot of them look very dual use. It's it's very hard to tell the difference between someone who's being malicious and someone who is like actually, you know, for defensive purposes like developing something.
有些人可能会说:那就需要模型什么都愿意做,因为这样就能完成所有这些双重用途的任务。但我觉得不是。如果你去问那个网络安全防御公司的人"你为什么做这份工作",他们会说:"我觉得这非常有用。我让很多东西更安全了。医院会受到攻击,我帮助防御。"他们会有一个非常好的解释来说明为什么做这份看起来很双重用途的工作。
我的观点是:如果你能验证身份,那就可以把那个背景给模型,解释"成为一个好的网络安全研究者意味着什么",向模型说明这一点,然后一旦你有了验证能力,你就可以——
Um and some people might be like okay so you just need models that are just willing to do anything because they'll do all these terrible dual use tasks. And I'm like well no because if you talked with the person at the cyber security defense firm and you were like why do you do your job? they'd be like, "Oh, here's the I think this is really useful. I make things a lot more secure." Like, you know, hospitals can come under attack and I actually like help protect against that. I try and develop, you know, like they would have a really good explanation for why they do their job even though their job looks very dual use sometimes.
And I'm like, we should just give that if you can verify then you can give that context to models and explain what is it to be a good cyber security researcher. um you explain that to the models and then once you have this ability to verify you can—
最后一个问题。你跟模型有这么深的关系,而消费者面对的只是一个空白文本框——就像 D&D 一样,你得自己构建一个世界,可能性无穷无尽。如果你要引导人们:"这些是你可以跟 Claude 一起做的有价值或有趣的事情"——你会推荐什么?
I wanted to just as a last question you know you you have such like a deep relationship with the models like and some ways like consumers interact with the models like it's a blank text box like I have to like invent it's like you know D and D or something you have to like just invent a world and there's so much possibility like if you were just to guide someone here are some like joyous or valuable experiences that you could have with Claude you might not be like, "What are some things you'd tell people like, oh, you should go spend some time with Claude doing X, Y, or Z?"
我有一个 prompt,大概是这样的——我试着之后把原文发出来。基本上是:"我希望你选一个某个领域研究生级别的概念——我最后会告诉你哪个领域——然后用寓言的形式写出来,间接地完整解释那个概念,就像寓言故事那样。我希望你写成那种到最后才渐渐明白是什么概念的形式。然后在寓言之后,写一段解释,说明你在讲和使用的那个概念是什么。"
我不知道为什么,但有很多我不了解的或者感兴趣的领域。这让我脑子里有了一堆用故事形式理解的概念。有时候我记不住术语,但——比如有一个关于进出口贸易的、为什么有些商品你倾向于进口。我脑子里就有了这个概念——用这种方式积累很多学科的概念真的很棒。
Um, I have this like prompt, which is essentially just um I'll try and maybe post the actual prompt that I use. It's basically uh I want you to take a concept from maybe like grad school level in a given domain and I'll tell you the domain at the end and I want you to write me a parable that would like fully explain like that concept but in an indirect way in the way that parables do. Um, and I want you to write it in such a way that only towards the very end does it maybe, you know, become sort of like clear what the concept is. And then after that, I want you to just like write an explanation for the concept that you were explaining and that you were using.
And I don't know why, but like, you know, there's lots of just interesting domains that I don't know anything about or that are like, you know, I'm interested in. And uh this has just led to me having all of these like stories in my head that like explain and sometimes I can't always remember the term but like um there was one on uh import export and why some goods you tend to import and I was just like I have in my head like this concept and I was like it's so nice to have all of these concepts from lots of different disciplines.
非常有意思。
Um very interesting.